Hacker News new | past | comments | ask | show | jobs | submit login

A Googler gave a talk at Real World Crypto in January describing a system that might back this. The relevant part of the talk starts around the 10 minute mark: https://www.youtube.com/watch?v=ee7oRsDnNNc&t=10m1s

In short, they're able to compute the intersection of the set of users who have viewed an ad with the set of people who purchased a product in a store, without either party disclosing their side of the set.




After studying the slides, it looks like they do a Diffie Hellman-like exchange of user identifiers.

g_i = Google's identifier for customer i. s_j = Merchant identifer for customer j.

If g_i == s_j then customer i == j. These identifiers might be phone numbers or email addresses or other identifiers that both parties have.

Neither party wants the other to learn any identifiers it doesn't already known about.

So Google picks a random secret value G and sends g_i^G to the Merchant. The Merchant picks random secret value T and sends g_i^(G* T) back to Google. Additionally the merchant sends s_j^T to Google.

Then Google calculates s_j^(T* G). If s_j^(T* G) == g_i^(G* T) then s_j == g_i and i == j. So know Google knows the exact set of their users who purchased something at the Merchant.

Additionally, for each s_j^(T), the Merchant sends a homomorphically encrypted value for the amount they spent. Then Google can perform a homomorphic addition on these encrypted values of only the intersection it calculated. The Merchant can then decrypt it to get the total sum and share it back to Google.

So in the scheme I described

1) Google still learns who is purchasing at which merchant. 2) Google does not learn individual amounts. 3) Merchants can't perform the same calculation to learn which users saw their ads unless Google sends s_j^(T* G) back to them (not pictured in the slides).


Either side learning that one of their customers is also a customer of the other company would be a significant privacy issue in some scenarios. If comparison identifiers are derived from an email address then there would likely be cases where one party also learns a portion of the login credentials their customer is using at the other company.


It's not a new and a very common process. A personal identifying variable is calculated in the same way on both sides (usually by a third party), then hashed. Hashed values are then compared/matched.


Which is better than exchanging the raw personal identifiers, but still isn't "safe" from a privacy point of view. One or both parties often hold PII and other personal info, and they know which of that is associated with a given hash. So what they learn through exchanging hashes is also (linked to) PII.

Furthermore, although reversing a one-way hash is problematic, computing hashes for known inputs (such as an email address) is straight-forward. So it is possible to just test or probe datasets for known identifiers and persons. The potential for hash collisions is of little comfort.


What makes you think they are just comparing hashes and not this diffie hellman algorithm that they say they have been using?


No idea how Google is doing it. Was referring to how others have been doing it for years.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: