They were tapping replication traffic on a database that included login IP addre...

lern_too_spel · 2024-03-24T22:05:40 1729723960

I missed that leak. Any chance you have a link for me to fill in my gap?

mike_hearn · 2024-03-25T09:41:58 1729723960

Slide 5 (Serendipity - New protocols) in this presentation:

https://github.com/iamcryptoki/snowden-archive/blob/master/d...

It's heavily redacted but the parts that are visible show they were targeting BigTable replication traffic (BTI_TabletServer RPCs) for "kansas-gaia" (Gaia is their account system), specifically the gaia_permission_whitelist table which was one of the tables used for the login risk analysis. You can see the string "last_logins" in the dump.

Note that the NSA didn't fully understand what they were looking at. They thought it was some sort of authentication or authorization RPC, but it wasn't.

In order to detect suspicious logins, e.g. from a new country or from an IP that's unlikely to be logging in to accounts, the datacenters processing logins needed to have a history of recent logins for every account. Before around 2011 they didn't have this - such data existed but only in logs processing clusters. To do real time analytics required the data to be replicated with low latency between clusters. The NSA were delighted by this because real-time IP address info tied to account names is exactly what they wanted. They didn't have it previously because a login was processed within a cluster, and user-to-cluster traffic was protected by SSL. After the authentication was done inter-cluster traffic related to a user was done using opaque IDs and tokens. I know all about this because I initiated and ran the anti-hijacking project there in about 2010.

The pie chart on slide 6 shows how valuable this traffic was to them. "Google Authorization, Security Question" and "gaia // permission_whitelist" (which are references to the same system) are their top target by far, followed by "no content" (presumably that means failed captures or something). The rest is some junk like indexing traffic that wouldn't have been useful to them.

Fortunately the BT replication traffic was easy to encrypt, as all the infrastructure was there already. It just needed a massive devops and capacity planning effort to get it turned on for everything.