Username = hex(sha1(topic_id + ip))[10-14] Notably, no salt used

upon_drumhead · on July 23, 2023

the topic_id could be considered a salt, no?

fbdab103 · on July 23, 2023

The topic_id would be shared for everyone who posted on it. For each topic_id, it is then a matter of hashing 4 billion IPs to match each post to the topic. A different salt applied to each user so that would require the 4 billion hashes for each user post to a topic (topic_id+IP+salt).

usaar333 · on July 23, 2023

There's no user accounts on the system.

I would consider the topic a salt - the problem is that the input is so small - just a 32 bit number which makes the "password" (user ip) fast to break.

The sane solution would be to generate large random ids per ip address, topic. And burn the mapping after some time.

bagels · on July 23, 2023

topic_id is public information, and predictable. It's neither secret, nor random.

rawling · on July 23, 2023

This is a weird use case (deliberately making the hash public) and the usual concept of a salt feels weird here. Any kind of server-side secret would have effectively stopped this attack, even if it was the same in every hash.

raverbashing · on July 23, 2023

But the question is: how many collisions with IPs there are by showing only 4 digits of the hash

I think there's where the anonymity claims might have come from

fbdab103 · on July 23, 2023

Assuming even distribution of four hex values: 16^4 = 65k potential IP collisions. From my quick skim of the paper, the authors made some assumptions about posting tendencies (frequent posters likely to comment on multiple topics) and looked for enriched patterns of IP addresses. An IP address assigned to multiple topics within a short timeframe is more likely to be real. As a control set, they took a different four values of the hash function (eg true function samples [10:14], false set took [11:15]) and used that as their statistical threshold.

raverbashing · on July 23, 2023

Sounds like there's some margin for plausible deniability there, especially if they can make it match different universities in the same range