Hacker News new | past | comments | ask | show | jobs | submit login

I dunno, the "logic" solution seems like the obvious one to me; if your boss really has that much trouble with propositional logic that they don't immediately see why it works, well, that's what code comments are for.

(...the trick is still cool, though; I can imagine other situations where it would be more useful. However it does seem like it potentially depends on the particular regex engine being used, in contrast to the author's claim about it being totally portable; yes, it'll compile on anything, but will it work?)




How could it not work. I've regularly relied on order or matching, and never found an environment that didn't test left-to-right for the `|` operator in regex.


> operator in regex.

regex is not regular expressions - if using NFA to match then you're matching all alternates simultaneously.

Russ Cox has good pictures explaining idea in 'Regular Expression Search Algorithms' section of <https://swtch.com/~rsc/regexp/regexp1.html>


I'm talking about regex. Regex libraries in practical use do not use NFA. I'm talking about actual code that's written using normal languages. I'm familiar with the difference between "regular expressions" as in "regular languages".


Lex/Flex, wich I think we can agree is used by "actual code that's written using normal languages" use DFAs, both inside rules and between rules, and they do not try '|' cases left to right (They probably could have if they wanted since there is a REJECT action that already force them to store the list of all the rules/texts that were matched):

a|ab {cout << "matched ab" << std::endl; } b { cout << "matched b" << std::endl; }

if provided with "ab", will match the first rule with "ab", and not the first with "a" then the second with "b".


All POSIX compatible regex engines do the same. It's somewhat linked to why POSIX regexes don't have non-greedy operators.

But DFAs can implement the preference-order semantics found in backtracking regex engines too. Russ Cox's articles show how to do that.

(Just adding some additional info to your point.)


Go's regexp package, Rust's regex crate and RE2 are examples of regex engines that are very much in practical use that use NFAs (among other things).


PCRE is a pretty well-defined standard, isn't it? And it's the one used by most of the languages I've worked with, including in MariaDB.


It doesn’t even rely on PCRE, just core regex.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: