> You can't use a salt, the contact you want to match has to be able to compute the same hash.
I think the misunderstanding here is that I meant one salt for all users at a time, instead of one salt for each hash.
And as long as stored hashes can be recomputed (by active users), the 'global salt' can be rotated over time.
We can pre-compute hashes with future salts in case apps are not constantly online. Then we could rotate salts, say, once every 2 hours.
The point being to make rainbow tables more expensive.
> You can hash the contact pair instead of one side of it, but even then an attacker can rent X instances of some hardware that is Y times faster than your phone, so if you target a difficulty of ~1 second per contact on your phone, you get a difficulty of 2700 hours / XY for the attacker to recover all of your contacts
> So if Y is 10 times faster than your phone, you only need X=10 to recover the contacts for a given number in a day. If the attacker is willing to set X=1000, Y isn't even relevant.
I think this should be (10^10/3600 ~ 2.7 million hours) / XY. Also, let's say we can make it Y seconds (perhaps the hash is memory hard, etc). Then to compute a single table (per salt) we need 2.7e6 hours, at say $0.01 per hour, so $27k per table.
Taken together that would mean to reveal the contacts of signups happening within a 2 hour window, would cost quite a sum.
It's not secure, of course - but it does give some measure of privacy.
If you think users are willing to wait more than an hour, you could 10x the cost too. But that's probably not practical anymore!
Edit: Oh. I see I made a mistake here! If the server computes a single rainbow table it will be able to retroactively de-anonymize all users that are already active at that time! :( Sorry.
Yeah, somehow I missed a couple of zeros, it would be 2.7 million.
Such a salt wouldn't tighten up the window though, for contact discovery to work, all the outputs have to be available to whatever is doing the matching.
OK hold on. Say the rotating salt thing doesn't work, for whatever reason.
But you yourself made a better suggestion! If we do hash (phone #) contact pairs, and every client publishes multiple hashes, one for each of its contacts, then a cheating server would need to compute (max) (10^10 * 10^10)/3600 ~ 2.7e17 hours worth of hashes.
That would actually be secure, no?
Edit: messed up the math myself this time; I think this is correct now though
For the contact discovery to be private, the phone has to calculate the hash. So the server cost also depends on the performance differential (many entities would likely still spend $50,000 to discover the contacts of a high value target).
Adding information increases the cost, but it also reduces the likelihood of a match.
You've edited your comment since I wrote the above. Anyway, if the attack is directed at a single user, there's only one 10^10 involved (and really, the phone number space isn't that big).
Really sorry about the edits! Thinking in real time :)
So about the size of the phone number space. Interestingly I guess you could at least limit it by country.
But, unless you already know the mobile phone numbers of the target's acquaintances, (in which case there is no real danger of 'leaking' your social graph),
you can't really limit it more, because mobile phone numbers are pretty random within each country's phone number space (I think).
So say the US has at least 0.5 billion assigned mobile numbers, and you don't know which are assigned, so the space could easily be 10^10 for just the US.
(I can't find any quick numbers on Google right now).
But even if we go down to say, 100 million, we would still get (10^7 * 10^7)/3600 ~ 2.7e11 hours worth of hashing to get all contact pairs. Which is still secure.
Now as far as limiting one end point to a single phone number - that would be a special case. I'm not sure if this isn't moving the goal post a bit.
As long as the server does not have access to your own phone number, it, or anyone who has hacked into the server,
cannot really use that to limit their computations. Unless they are already targeting you through other means (possibly you are a person of interest to someone), and so know your phone #.
In that case yes, they could reveal (part of) your social graph for a (maybe) affordable cost. But in that very special case, I think we are dealing with nation-state adversaries.
And in that case, the telcos probably already have a list of your phone # contacts. So it would probably be easier to simply get it from them.
The only thing left to be revealed would be that you communicated with a specific acquaintance using that particular app, at some particular time.
So a targeted attack could leak that metadata.
It's not perfect.. but I think it might just be good enough for most users.
If we go off the deep end with an attack tailored to a single phone #, we also have to worry about a whole lot of other attack vectors that are probably more important.
Lastly, note that the user (if sufficiently paranoid) could simply opt out of the automatic contact search post-install.
As for mobile phone numbers being randomly placed inside larger space of all phone numbers, while this makes some sense for US/NANP, in most of the rest of the world mobile numbers have some semi-well defined prefixes (as in "number that starts with 6 or 7 is certainly mobile phone, discerning what operator is sligthly non-trivial") and even for NANP there probably is some publicly available database that says whether given number is mobile or not.
I don't think it's moving the goalposts, the ability to focus an attack on an individual number is an important consideration for an individual evaluating the system. That only their contacts are revealed is not much consolation when their contacts are revealed.
The ability to move out to the phone network at observe metadata there does make the discussion somewhat academic.
I think the misunderstanding here is that I meant one salt for all users at a time, instead of one salt for each hash.
And as long as stored hashes can be recomputed (by active users), the 'global salt' can be rotated over time.
We can pre-compute hashes with future salts in case apps are not constantly online. Then we could rotate salts, say, once every 2 hours.
The point being to make rainbow tables more expensive.
> You can hash the contact pair instead of one side of it, but even then an attacker can rent X instances of some hardware that is Y times faster than your phone, so if you target a difficulty of ~1 second per contact on your phone, you get a difficulty of 2700 hours / XY for the attacker to recover all of your contacts
> So if Y is 10 times faster than your phone, you only need X=10 to recover the contacts for a given number in a day. If the attacker is willing to set X=1000, Y isn't even relevant.
I think this should be (10^10/3600 ~ 2.7 million hours) / XY. Also, let's say we can make it Y seconds (perhaps the hash is memory hard, etc). Then to compute a single table (per salt) we need 2.7e6 hours, at say $0.01 per hour, so $27k per table.
Taken together that would mean to reveal the contacts of signups happening within a 2 hour window, would cost quite a sum.
It's not secure, of course - but it does give some measure of privacy.
If you think users are willing to wait more than an hour, you could 10x the cost too. But that's probably not practical anymore!
Edit: Oh. I see I made a mistake here! If the server computes a single rainbow table it will be able to retroactively de-anonymize all users that are already active at that time! :( Sorry.