I personally still really like the lazy DFA approach. Especially for the single-...

glangdale · on June 15, 2021

I'm OK with laziness, but lazy != "lazy DFA". I can't help but think people keep building RE2-style DFA constructions because they don't know how to run NFAs efficiently. The idea that your automata has to keep allocating memory is pretty weird, especially in MT land.

What I'm thinking about lately is sticking a lot closer to the original regular expression parse tree when implementing things. Yes, that leaves performance on the table relative to Hyperscan, but I suspect the compile time could be extremely good. Also, it would be better suited to stuff like capturing and back-references.

Like I said, I suspect I'll be building "Ultimate Engine the Third" sometime in the not-too-distant future (Hyperscan's internal name was "Ultimate Engine the Second", an Iain M. Banks reference).

burntsushi · on June 15, 2021

Yeah, we've had this conversation before. :-) I look forward to seeing what you come up with!