This is interesting and I’d like to see a follow-up from Cursor, but the tone is unbearable and egregiously misrepresent the Cursor blog post, I guess for a circle of followers who won’t bother to check the original anyway and is just there for the dunking.
> So how cursor came up with such a beautiful solution only in 2026? Is everyone around dumb and never did anything like this before?
Cursor post doesn’t claim anything original, they attribute every approach discussed to someone else, including the one they claim to have settled on:
> Here's another very smart idea. You may have seen it used in ClickHouse for their regular expression operator, and also at GitHub, in the new Code Search feature that shipped a couple years ago and which does allow matching regular expressions. It's called Sparse N-grams, and it is the sweetest of the middle grounds.
The very next sentence in the fff article is amusing
> No, actually all the theory in the blog post they made (that makes sense) is coming from the paper https://swtch.com/~rsc/regexp/regexp4.html that is stated behind google code search project.
Because 1. the paper is prominently cited in the original, and 2. no it doesn’t cover all the subsequent optimizations discussed. “That makes sense” is doing a lot of work apparently.
Now, the main claims in the fff article are:
- Few/no people need to search entire repos that large;
- For large repos (no one needs to search), fff’s index is smaller (~100MB for chromium vs ~1GB for Cursor) and faster to create (~8s vs ~4m) and still fast (~100ms vs ?).
But all the comparisons are weirdly fixated on the MAX_FILE_SIZE query used for algorithm demonstration purposes in the original. That’s hardly a fucking regex search. Readers have no idea of how, say, MAX_.+_SIZE does after reading that rebuttal.
So, again, interesting, unbearable tone and egregious misrepresentation, would like a follow up.
Disclosure: no affiliation, not using either now.