Hacker News new | past | comments | ask | show | jobs | submit login

Also SSN isn’t that distinctive of a format. nnn nn nnnn. Check bits and reserved prefixes were all removed decades ago when it became clear we’d run out unless we use the whole name space (and even then that buys us to 2100). \d{3}\s?\d{2}\s?\d{4} will match a surprising amount.

Detecting SSNs is hard without accepting a high false positive rate. Much harder than phone numbers, credit card numbers, or cloud credentials.




\w\d{3}[\s\-]?\d{2}[\s\-]?\d{4}\w should not have many wrong results.

You can also try to guess is something is a list of SSNs from the context.


I'd assume many systems would store SS numbers without spaces or dashes in the backend so that rendering is up to the client. Which means you're looking for 9 digit strings. For example, full zip codes (xxxxx-xxxx) are also 9 digit strings.


I've posted elsewhere in this thread about this. There's really no reason to expect SSNs as strings for internal use. 32bit integers readily represent the same, as the max SSN is just a 9-digit number. I've seen at least one client store SSNs as INTs in a database and handle left-padding to 9 characters and interposing hyphens in display code.

Any 9-digit integers are immediately suspect under this reasonable storage choice.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: