logoalt Hacker News

burntsushitoday at 1:57 PM1 replyview on HN

> for simple string literals it will definitely lose to Hyperscan and Rust regex since they have a high effort left-to-right SIMD algorithm that we cannot easily use

I think "simple string literals" undersells it. I think that description works for engines like RE2 or Go's regex engine, but not Hyperscan or Rust regex. (And I would put Hyperscan in another category than even Rust regex.) Granted, it is arguably difficult to be succinct here since it's a heuristic with difficult-to-predict failure points. But something like: "patterns from which a small number of string literals can be extracted."


Replies

ievievtoday at 2:06 PM

yes, that is correct. also Rust's engine matches the full unicode spec as individual characters, whereas .NET's will chop emojis into two sometimes, so Rust at a disadvantage here.

something i've been also wondering is how does Harry (https://ieeexplore.ieee.org/document/10229022) compare to the Teddy algorithm, it's written by some of the same authors - i wonder if it's used in any engines outside of Hyperscan today.