Hacker News new | past | comments | ask | show | jobs | submit login

// from appendix B of rfc 3986 (http://www.ietf.org/rfc/rfc3986.txt)

'&^(([^:/?#]+):)?(//([^/?#]))?([^?#])(\?([^#]))?(#(.))?&'

The above regular expression is meant to match URI's. Since almost anything can be a URI, the re also matches almost everything.




You know you have done too much work with regular expressions when you think "Hey, wait a second, that can't possibly work" and start trying to debug it in the Ruby console for 10 minutes prior to realizing "Oh, HN is italicizing it because of the asterixes it is silently stripping."


  ^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?


The article was about e-mail addresses. Here's a link to an RFC-compliant regular expression for matching valid e-mail addresses: http://www.ex-parrot.com/~pdw/Mail-RFC822-Address.html

Despite that evil voice telling you "There is nothing Perl cannot do!" (http://www.bastichlabz.org/bastich/Strips/ba980225.gif), regular expressions alone are not sufficient for real parsing,




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: