Hacker News new | past | comments | ask | show | jobs | submit login

It's more a question of which ones can't be. There are some really nasty and not very obvious gotchas here; https://regular-expressions.mobi/catastrophic.html has a good dive into how, for example, backtracking combines with incautious regex design to produce exponential behavior in the length of input.

I don't have a hard and fast rule of my own about regex complexity, but I do have a strong intuition over what's now ca. 25 years of working with regexes dating back to initial exposure in Perl 5 as a high schooler. That intuition boils down more or less to the idea that, when a regex grows too complex to comprehend at a glance, it's time to start thinking hard about replacing it with a proper parser, especially if it's operating over (as yet) imperfectly sanitized user input.

Sure, it's maybe a little more work up front, at least until you get good at writing small fast parsers - which doesn't take long, in my experience at least; formal training might make it easier still, but I've rarely felt the lack. In exchange for that small investment, you gain reliability and maintainability benefits throughout the lifetime of the code. Much of that comes from the simple source of no longer having to re-comprehend the hairball of punctuation that is any complex regex, before being able to modify it at all - something at which I was actually really good, as recently as a decade or so ago. The expertise has since expired through disuse, and that's given me no cause for regret; the thing about being a regex expert is that it's a really good skill for writing unreadable and subtly dangerous code, and not a skill good for much of anything else. Unreadable and subtly dangerous code was fine when I was a kid doing my own solo projects for fun, where the worst that'd happen is I might have to hit ^C. As an engineer on a team of engineers building software for production, it's not even something I would want to be good at doing.




> That intuition boils down more or less to the idea that, when a regex grows too complex to comprehend at a glance, it's time to start thinking hard about replacing it with a proper parser

You can get some surprisingly complex yet readable regexes in Perl by using qr//x[1] and decomposing the pieces into smaller qr//s that are then interpolated into the final pattern, along with proper inline comments in the regexes themselves.

[1] https://perldoc.perl.org/perlre#/x-and-/xx


You still have to reason about the whole thing, though. This doesn't make that any easier, but I bet it makes it feel easier.


Decomposition is a proven method for making complex code both feel and actually be easier to reason about.

Regexes are code.

Therefore, decomposition makes complex regexes both feel and actually be easier to reason about.


I don't see anything about qr//x that makes regexes built this way less vulnerable to the kind of exponential backtracking problem under discussion here.

I do see a great opportunity to, by assuming interpolated qr// substrings have the locality the syntax falsely suggests, inadvertently create exactly that kind of mishap with it being minimally no easier, and potentially actually more difficult, to notice.

Write your code however you like, of course, including concatenating strings and passing the result to 'eval'. The last time I dealt with more Perl than a shell one-liner was around 2012, and that the language encourages this kind of thing is one of the reasons I'm glad of that.


Given that I write my code with a text editor that does nothing but concatenate strings that I input and then I pass it to a compiler or an interpreter, all of the code I write is concatenating strings and passing it to 'eval'.

And I use proper decomposition to keep it cognitively manageable. It's pretty clear that reasoning about composition is beyond you, but trust me that given two procedures that both do not have an undesirable property, one can rest assured that simple composition will not introduce that undesirable property.


Many things are beyond me. Perhaps it's to my good fortune that the generally low utility of gratuitous personal insults is not among them. Certainly the next technical discussion I see improved by such behavior will be the first.


Well then, in the interest of amity let me suggest that it would be to your good fortune to work on your self-awareness. But, should you prefer not to, then by all means, you do you.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: