Hacker News new | past | comments | ask | show | jobs | submit login

This is slightly wrong, depending on if your regexp engine requires non-greedy matches to find the shortest possible match or not. (I rarely use non-greedy matches because of this, among other reasons.)

    /"(?:\\.|.)*?"/
May either stop at the first ", even if escaped (the \ will match the . case), or it will continue to the last " even if escaped (for the same reason). If anything, you've introduced a vulnerability by breaking the escape handling.



I’m open to the possibility this will vary based on engine, but I don’t think it should. The pattern specifies:

1. Match a quote.

2. Match the shortest sequence of:

  A) any character preceded by a backslash, or

  B) any thus far unmatched character, until
3. Matching the first unmatched quote.

The 2.A case should match before laziness kicks in. The only case when it shouldn’t is if the backslash has already been consumed by a previous escape.


On the main issue of your regexp being wrong: You can feed your regexp into any engine and see it will happy match

    "\"
which it should not.

As for the issue of varying between engines: The classic deviation between POSIX and non-POSIX regexp is whether you pick leftmost-longest and tiebreak by alternate order, or leftmost first alternate. (POSIX is the correct answer because there are benefits to commutative alternation.) Non-greedy regexp has a similar issue for whether it should prefer shortest or first alternate. (Proper thinking about regexes doesn't involve temporal relations like "before" or "thus far" or whether something is "consumed" - even in engines that don't implement leftmost-longest this can only confuse.)

Engines that take alternation order into account will also have it match the complete string of:

    "\""
Probably to the confusion of anyone who read that non-greedy operators will choose the shortest match.

tl;dr: Don't use non-greedy operators. Don't think about regular language matching in temporal or procedural terms. Don't try to rewrite regexps without test cases.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: