logoalt Hacker News

masfuertetoday at 2:48 PM1 replyview on HN

Cheers. I was more confused by how you were doing multiple matches. So I read the paper, which describes the AllEnds algorithm. If I understand correctly, the reverse pass captures all of the match starts and these need to be remembered for the forward pass. Which is what you were showing above, but I didn't follow it.

So, once it gets going, a traditional engine can produce matches iteratively with no further allocation, but RE# requires allocation proportional to the total number of matches. And in return, it's very much faster and much easier to use (with intersection and complement).


Replies

ievievtoday at 3:08 PM

Yes, exactly correct

It's also beneficial to merge some of the matching locations into ranges where possible, so when `a*` matches a long sequence of '|a|a|a|a|a|', it can be represented as a range of (0,5), we do this to keep the lookaround internal states smaller in the engine.