A Googler gave a talk at Real World Crypto in January describing a system that m...

singron · on May 24, 2017

After studying the slides, it looks like they do a Diffie Hellman-like exchange of user identifiers.

g_i = Google's identifier for customer i. s_j = Merchant identifer for customer j.

If g_i == s_j then customer i == j. These identifiers might be phone numbers or email addresses or other identifiers that both parties have.

Neither party wants the other to learn any identifiers it doesn't already known about.

So Google picks a random secret value G and sends g_i^G to the Merchant. The Merchant picks random secret value T and sends g_i^(G* T) back to Google. Additionally the merchant sends s_j^T to Google.

Then Google calculates s_j^(T* G). If s_j^(T* G) == g_i^(G* T) then s_j == g_i and i == j. So know Google knows the exact set of their users who purchased something at the Merchant.

Additionally, for each s_j^(T), the Merchant sends a homomorphically encrypted value for the amount they spent. Then Google can perform a homomorphic addition on these encrypted values of only the intersection it calculated. The Merchant can then decrypt it to get the total sum and share it back to Google.

So in the scheme I described

1) Google still learns who is purchasing at which merchant. 2) Google does not learn individual amounts. 3) Merchants can't perform the same calculation to learn which users saw their ads unless Google sends s_j^(T* G) back to them (not pictured in the slides).

PrivWeeble · on May 24, 2017

Either side learning that one of their customers is also a customer of the other company would be a significant privacy issue in some scenarios. If comparison identifiers are derived from an email address then there would likely be cases where one party also learns a portion of the login credentials their customer is using at the other company.

hammock · on May 24, 2017

It's not a new and a very common process. A personal identifying variable is calculated in the same way on both sides (usually by a third party), then hashed. Hashed values are then compared/matched.

PrivWeeble · on May 24, 2017

Which is better than exchanging the raw personal identifiers, but still isn't "safe" from a privacy point of view. One or both parties often hold PII and other personal info, and they know which of that is associated with a given hash. So what they learn through exchanging hashes is also (linked to) PII.

Furthermore, although reversing a one-way hash is problematic, computing hashes for known inputs (such as an email address) is straight-forward. So it is possible to just test or probe datasets for known identifiers and persons. The potential for hash collisions is of little comfort.

singron · on May 24, 2017

What makes you think they are just comparing hashes and not this diffie hellman algorithm that they say they have been using?

hammock · on May 24, 2017

No idea how Google is doing it. Was referring to how others have been doing it for years.