Hacker News new | past | comments | ask | show | jobs | submit login

My favorite regex is the E-Mail Regex[1].

Works 99.99% of the time, is unreadable/unmaintainable, makes you understand that the only true way to verify an e-mail is to send an e-mail to that address.

[1] - https://emailregex.com/

PS: Gotta love the Perl / Ruby version




The only way you should ever "validate" an email address with a regex is like this: /@/


I suppose it depends on what we mean by validate. Running an ecommerce site, I got a lot of mileage out of prompting the customer to fix emails that "looked wrong". We allowed them to proceed if they wanted. A really common one was "user@gnail.com" when "user@gmail.com" was wanted. We used a slightly modified version of https://github.com/mailcheck/mailcheck and found it to be really useful.


Surely that should be /.+@.+/


The one I use for anything that might take user input from a browser is the one defined in the HTML5 spec for input[type=email]:

    /^[a-zA-Z0-9.!#$%&'*+\/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$/
There’s no sense being less permissive, if it’s good enough for browsers it’s the baseline expected by browser users. But there’s no sense being more permissive for the same reason.


Yes, there is. HTML does not define what email addresses look like. If input [type=email] rejects valid addresses, it's harmful garbage.


Huh? My point is that if you expect user input from a browser’s input[type=email], you have little choice but to accept that it will reject emails not matching that pattern. Harmful garbage or not, a more permissive pattern won’t mitigate that.


Your regex would validate:

  this-is-not-a-valid-address@
  @this-is-not-valid-either
  @


But allowing too much is better than allowing too little, as usually you have to send an actual email to verify ownership anyways. Any regex more complex than /.+@.+/ fails some valid email address


Which was the point of my first message.


Surely that should be

/^[^@]+@[^@]+$/

?


According to the RFC compliant email regex,

   “\@“@example.com
is a valid email address, which your simplified test would reject.


Exactly.

The worst regex would be:

  /^[a-zA-Z0-9\-_]+@[a-zA-Z0-9\-_]+\.[a-zA-Z0-9\-_]+$/
Because it would invalidates `my-email+custom-inbox@example.com`. And that's a pattern I use to automatically sort incoming mails.

Many websites use such a regex :(


The correct way would be to implement a parser off the ABNF defined in whatever the current RFC is for email addresses.


What exactly is the current RFC for email addresses? Do you go with 2822/2823 or do you read all the extension ones?


And yet, the successfully parsed e-mail can still be inexistant and therefore invalid :)


Or, more insidiously, the address can exist, its recipient can receive mail there, your form can validate it, but your SMTP server can't handle the character to send any in the first place.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: