@ievev Have you ever seen an implementation like @bablr/regex? https://github.com/bablr-lang/regex-vm It's an NFA system so it isn't going to be winning any awards for throughput, but in this particular case it does seem to completely avoid the complexity blowup. It will run your heap out of memory though on really big inputs.
The strategy this engine uses is just to evolve the state as a function of time. A match can be successfully completed, yet not be emitted because some other longer match could still supercede it by being longer or more leftmost.
I tried the pattern /d+s+/g on 10,000,000 digits followed by no space. It took 4 seconds to return no results. I tried it on 20,000,000 digits followed by no space. It took 8 seconds to return no results. I tried on 100,000,000 and I ran out of heap space.
Test setup: https://gist.github.com/conartist6/051838025af1e04d966e03aa9...
Do any RE implementations do anything like the query planning that databases do or the rewrites that compilers do that can replace the RE with a different sequence of REs or string searches that might execute faster?
For example in the expression in your example (I'm assuming based on your description of the test data that /d+s+/ means the same as /\d+\s+/ in the RE engines I've used) any match must contain a digit followed by a space.
A scan for all places where a digit is followed by whitespace with each such place then being checked to find the length of the string of whitespace starting there and the length of the string of digits ending there should be linear time and constant heap space.
I have not heard of this before, i will have a look!
Returning no results is going to be linear in any DFA or NFA based implementation, though. You go character by character, and confirm that there are no matches.
It's only when you return multiple matches that the engines have a problem and become superlinear.