I personally still really like the lazy DFA approach. Especially for the single-pattern use case. In particular, it handles the case of large Unicode character classes quite well by avoiding the construction of huge DFAs unless the input actually calls for it.
With that said, I have longed for simple ways of composing regexes better. I've definitely fallen short of that, and I think it's causing me to miss a whole host of optimizations. I hope to devote some head space to that in the next year or so.
I'm OK with laziness, but lazy != "lazy DFA". I can't help but think people keep building RE2-style DFA constructions because they don't know how to run NFAs efficiently. The idea that your automata has to keep allocating memory is pretty weird, especially in MT land.
What I'm thinking about lately is sticking a lot closer to the original regular expression parse tree when implementing things. Yes, that leaves performance on the table relative to Hyperscan, but I suspect the compile time could be extremely good. Also, it would be better suited to stuff like capturing and back-references.
Like I said, I suspect I'll be building "Ultimate Engine the Third" sometime in the not-too-distant future (Hyperscan's internal name was "Ultimate Engine the Second", an Iain M. Banks reference).
With that said, I have longed for simple ways of composing regexes better. I've definitely fallen short of that, and I think it's causing me to miss a whole host of optimizations. I hope to devote some head space to that in the next year or so.