Hacker News new | past | comments | ask | show | jobs | submit login

Hash of the IP would be sufficient for that purpose and would keep location anonymous.



IPv4 isn't that big. You could compute the hashes of all addresses on consumer hardware. You would want to add a private salt.


If you add a private salt, you won't be able to do "grouping" or identifying duplicates, which is what this thread was discussing.

If it was me, and I wanted "independent" researchers to highlight clusters or duplicates I would do the following as a first-pass solution:

Store an internal mapping of IP->unique sequential number, likewise do the same for usernames. The goal is that it's random and not based on any hash or ordering. So people with either the IP, username or username + IP, can't identify the unique internal numbers.

Then release those. Though tbf, if I was part of any sort of "bot prevention" or "sock puppet identification" team at Reddit, I'd be doing this already. But we all know the dirty secret is to not actually track down such abuse, but to appear like you are doing so, so that you can inflate your user count with plausible deniability.


Works just fine with a keyed hash (same key & IV for everyone).

Just make sure to remove the NSFW accounts to avoid future "incidents".




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: