logoalt Hacker News

burakemirtoday at 12:21 PM0 repliesview on HN

I believe the technique described is similar to what I published here (this is not about all matches, but left-longest/shortest)

"Compiling regular expressions to sequential machines" (2005) ACM Symposium of Applied Computing https://dl.acm.org/doi/10.1145/1066677.1066992

(Note that there is a small mistake in the paper due to ambiguity, found by Vladimir Gapeyev. So the result does not hold in the generality stated but only for a special case when there is no ambiguity at "the end". There went my first PhD student publication...)

The two pass technique used to be implemented in the Scala compiler at the time (building DFAs upfront) , which could do regexps over lists and other sequences, but the approach would not work for top-down tree regexps so I did not pursue that and it got ripped out later.

It is good to see derivative regular expressions, Brzozowski/"position automata" used and discussed.