Hacker News new | past | comments | ask | show | jobs | submit login

Python just works... Every now and then I'm still bitten by the 2->3 transition. And I'd very much not like it to be a required dependency in my systems either.

And no, regexps get a bad rep but they are for the easy 99% and insanely quick to come up with. Learn basic syntax and you'll be thankful for decades to come.




Learn basic syntax and you'll spend the next few decades wondering each time _which_ basic syntax is expected because none of them ever say.


I know the basic syntax and I never have that problem. Perhaps you mean the advanced syntax? However I don't see the problem, there either. I usually don't need it, and to be honest, when I do, I find it more maintainable to use multiple simpler expressions combined with some programming.


Some by "regex" mean "globs" and you can just use "*" and "?" for stand-ins for some number of characters.

Some allow "|", some allow backrefs, some allow "()", or require them escaped with \, or allow them but not with * after. Some are case insensitive, some not. Some allow "{0-5}", some allow "[0-9]", some have handy things like "\w".

It's just the guessing game of exactly what they want. It should be required that each regex box has an example next to it using as many allowed features as possible.


> Some by "regex" mean "globs" and you can just use "*" and "?" for stand-ins for some number of characters.

But that's _not_ regexp, it's glob; a totally different pattern matching system. I mean, you can call a duck a horse, but that doesn't mean you're right... or that there's anything confusing about horses.


> Some by "regex" mean "globs" and you can just use "*" and "?" for stand-ins for some number of characters.

What kind of scary tool is that?


Glob is used in most shells, plus in several languages (Tcl has both glob and regexp matching).


Yeah, but they don't call it "regex" which would be the scary part.


Ah, I interpreted the original statement as

> Some [people] by "regex" mean "globs"


Ah, that makes far more sense. Though it makes OPs argument even more nonsensical ;)


Look for 'pcre' on the label. Accept no substitutes and you'll be fine.


*almost always: https://blog.cloudflare.com/details-of-the-cloudflare-outage...

>The Lua WAF uses PCRE internally and it uses backtracking for matching and has no mechanism to protect against a runaway expression.

https://blog.cloudflare.com/making-the-waf-40-faster/

>Back in July 2019, the WAF transitioned from using a regular expression engine based on PCRE to one inspired by RE2, which is based around using a deterministic finite automaton (DFA) instead of backtracking algorithms. This change came as a result of an outage where an update added a regular expression which backtracked enormously on certain HTTP requests, resulting in exponential execution time.

>After the migration was finished, we saw no measurable difference in CPU consumption at the edge, but noticed execution time outliers in the 95th and 99th percentiles decreased, something we expected given RE2's guarantees of a linear time execution with the size of the input.


You must not have suffered enough, I mean used regexs widely enough.

He meant “syntax” in the sense that different regex engines have different syntax and capabilities - can I do a negative look ahead assertion in engines Z, how do I do a zero width lookaround in pcre, gnu, python, posix, etc.

Depending how far down the rabbit hole you want to go, start here:

https://swtch.com/~rsc/regexp/regexp1.html


Oh and dont forget that just because the documentation says it uses a specific regex syntax, that does not mean its correctly implemented. It gets worse when an application examples specify a specific syntax but what happens in the background is that it will use the default reg exp engine, which can be system or enviroment specified.


I just don't count those cases as 'basic' syntax.


Basic - naming a capture group. Awk / posix - \1 Perl - $1. Python - (?P<name>).

I’d find it hard to ignore the specifics. I mean if you only ever use one tool and were never exposed to other regex engines I guess.


Quick: which programs use ‘|’ and which use ‘\|’?




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: