Hacker News new | past | comments | ask | show | jobs | submit login

I don't see the relevance of how regexes are written to the problem they had. The engineer didn't typo the regex, or have a hard time understanding what it would match.

Instead, they didn't understand the runtime performance of the regex, as it was implemented in their particular system. No amount of syntax can change that.




No amount of syntax can change that

A framework that allows well-written, "normal" code to parse out what you want, can produce something easier to understand and maintain, surfacing this type of bug in a more obvious way.

Cryptic syntax is the main reason I avoid regexes (particularly complex ones).

Too much obfuscation between the code you write and the steps your program will take. Granted, my concern doesn't apply to master craftsmen who truly understand the nuances of the tool, but in the real world those are few and far between.

ps. I get there was a lot more going on in this postmortem than just one rogue regex.


By writing regexs by hand, you can accidentally introduce an obviously backtracking pattern such as * .=. *. By programmatically composing them, a program can analyze each regex group to find simple problems, and then combine them in ways that will avoid backtracking.

This isn't even why you should compose them programmatically, though. Perl allows you to compose a regex with in-line comments (https://perldoc.perl.org/perlfaq6.html#How-can-I-hope-to-use...), but it's still a hand-crafted regex, which is error-prone, much like composing code by hand. If you can get a machine to generate it for you, you avoid unintentional human-introduced bugs, as well as make it easier to read and reason about.

If you have a ton of regex's, or they are super important to your business, you should consider not editing them by hand. There's only so much test cases can do to prevent bugs.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: