logoalt Hacker News

matt_trentinitoday at 6:01 AM1 replyview on HN

Yes, it was chosen for low size and memory constraints. But it is limited in features (like counted repetitions):

https://docs.micropython.org/en/latest/library/re.html

so alternatives to provide additional features have been discussed... Either extending the existing module or swapping to a more feature-rich library. Possibly even doing so for larger micros that can afford the additional flash/memory, though that makes support more challenging.


Replies

thaliaarchitoday at 7:23 AM

I was talking about the performance, not the feature set. Russ Cox's re1 and the re1.5 fork have several engines for different implementation strategies. re1 was written for primarily pedagogical reasons, so its minimality comes from that.

The engine chosen by MicroPython is vulnerable to catastrophic backtracking and switching to the Pike VM implementation would fix that. Instead of backtracking in the text when the pattern doesn't match, the Pike VM iterates each char in the text only once, visiting the states valid for that position in lock step. Consequently, it allocates a list of “thread”s, proportional in length to the number of states in the pattern (though usually patterns have relatively few states). Many security issues have resulted from regexp denials of service, so this slight memory tradeoff might be worthwhile.

Since recursiveloop.c has been changed by MicroPython, those changes would need to be ported to pike.c. The fixes are small and none of the extra features exploit the backtracking, so this should be easy.