Hacker News new | past | comments | ask | show | jobs | submit login

Could you just clarify something? You state this data is anonymous, but that you use phone numbers as nodes? Do you mean some sort of ID number representing phone numbers, or actual phone numbers? I ask because I wouldn't consider phone numbers anonymous.



They could easily SHA1 the phone numbers to "Anonymize" them.


The input space is too small for SHA1 to effectively anonymize. The NANP, for example, has less than 10^9 possible numbers; it would be a very simple task to create a rainbow table mapping every possible phone number to its corresponding SHA1 hash.

For the same reason, you can't just use a simple cryptographic hash to "anonymize" data such as birthdates, zip codes, SSNs, or PINs.

Using a key derivation function with a very high cost factor can mitigate this to some extent (e.g. making it take 5 seconds on an average CPU to generate the hash from a phone number), but it by no means makes for secure anonymization; eventually computing power will catch up.

Encrypting the number with a secret key (or using an HMAC), and destroying the key after the anonymization takes place might be a reasonably secure way of doing this, however.


Maybe just salt each number with a random salt?


Yep, we effectively did this. But as my comment alludes to, it doesn't really matter. You have enough to uniquely identify someone.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: