Hacker News new | past | comments | ask | show | jobs | submit login
Tell-all telephone – Six months of phone metadata visualized (zeit.de)
568 points by danielhunt on July 1, 2013 | hide | past | favorite | 72 comments



You can infer some amazing things from simple metadata. I spent six months in an R&D team at a large mobile telco, with the task of trying to infer as much as possible from anonymous customer data just like this.

Figuring out where you live and work, to a reasonable accuracy, is quite easy; you simply look at where the most outgoing calls/SMS originate from at certain hours of the day over an extended period.

We built up our own social graph. You treat calls and text messages as directed edges and phone numbers as nodes. These were fascinating to look at.

You can even try to guess when someone gets off a plane. When a plane lands you'll suddenly see lots of incoming undelivered text messages as people turn their phones back on. If a node was last seen in a far away cell, but then reappears in this group, you can cross-correlate with arrival times and make a reasonable guess.


INterestingly what you describe is probably not legal under EU privacy laws. People are horrified by NSA just collecting this data. And yet you calmly describe this process.

Your opinions are not given in your post - you're not saying whether it's good or bad to do this - but it's clear that the company you worked for didn't see doing this as evil.

I find it fascinating that this kind of data mining has been going on for years and that opposition has been so quiet.

(Please, this post is not any judgement about you!)


All the telcos collect this data as far as I know. They're allowed to for the purposes of improving and maintaining their network. A few crunch it for marketing purposes but this has to be opt-in (not that customers would have any idea what that might entail, even if the privacy policy describes it broadly). I can't comment on the legality of the project I worked on, but I assume it was checked out by legal counsel.

I personally wouldn't want my data mined in this way. I don't retain any brand loyalty, lets put it that way.


Does the EU actually have laws against collecting this data without opt-in for marketing?

On a related note: it would be really interesting to see privacy laws visualized around the world.


It may well be legal, if part of the stated purpose of collecting the data, as agreed with the customer in the T's and C's that they thoroughly read through and agreed to, was to collect data for research, network development, service development and other wooley terms that cover this kind of R&D.

What many companies do is anonymise the data, remove the actual phone numbers /account details and replace with dummy numbers. While not ideal (backwards matching is possible due to the clues the data "gives up"), its probably safer than it sounds.


There's also the matter of who's doing it. It is, imo, one thing that the company whose services I use collect data on my use of their equipment - for gathering network performance data, troubleshooting (and billing info) that they are using themselves and not handing over to other parties.

It's another thing to have government agencies snooping in such data for entirely different purposes.


I think his point was that even anonymized data isn't as anonymous as you think.


It's basically what Google Now uses to figure out your home and work locations and prepare directions and traffic for you.


> INterestingly what you describe is probably not legal under EU privacy laws

Not only is it legal, it's essential for telcos to expand or enhance their cell coverage.


If anonymisation is reversible (it seems clear that it is reversible, if it was anonymised at all), the data likely again falls under the Data Protection Directive in the EU.

There, personal data legally requires explicit permission for each specific purpose it's used for and cannot be stored any longer than is necessary for that purpose.

https://en.wikipedia.org/wiki/Data_Protection_Directive#Prin...


Improving cell coverage / planning for growth is a specific purpose. You can then argue how long to keep the metadata for - any reasonable argument starts in years.

I think we need to accept that metadata and all digital comms is communications in public. And that we need social conventions backed by law to make certain things politely not read unless a warrent is served.


>Improving cell coverage / planning for growth is a specific purpose.

Agree, so you just need contractual permission (not hard to get). You can't decide later that you want to use it for some other apparently-innocuous reason.

>any reasonable argument starts in years.

Cell coverage data from years ago is relevant to today's growth? Although that might be enough to get you out of a legal hole, I find that highly dubious.

The law is there, but it's not understood, not clear enough, nor enforceable enough for commerce to fall in line.


what your company doesn't make year over year comparisons? Many telco's at least in the US don't own every tower they rent and if they can compare that we aren't utilizing this tower is this a downward trend? Should we not renew our contract for this cell tower location. Then there is just the planning aspects of anticipating heavy use patterns for major events, concerts, festivals etc. you can't compare how your system is handling the added demand as an even grows if you don't have data


Presumably one would need to review population growth, movement, etc trends.


This is exactly what the NSA have been doing, according to an earlier whistle blower, see eg my submission to HN here with his keynote from Hope 9 (2012):

https://news.ycombinator.com/item?id=5964403

edit: To save a click: http://www.youtube.com/watch?v=dxnp2Sz59p8 [NSA whistleblower William Binney Keynote at HOPE 9 (2012) [video]]


I went to a presentation by DONG Energy [1] where they were discussing how, with their in-progress upgrade to remote-reporting, per-residence electric meters, they can soon infer all sorts of things about people's apartments and daily habits from the distinctive patterns of electric usage. Not just aggregate usage like inferring when someone's awake or asleep, but in much more detail based on the distinctive patterns different devices make.

They do seem sensitive to privacy fears (perhaps partly because the regulatory climate forces them to be), but the level of detail they were able to get out of the electric data in a prototype system was quite eye-opening. They had some ideas about using it for consumer self-education, e.g. feeding it back into a small display near the meter that would make energy-saving suggestions. But even that could get creepy, because it could make suggestions about specific devices you owned, when you never told the energy company that you owned them!

[1] An unfortunate acronym, from Danish Oil and Natural Gas


Could you just clarify something? You state this data is anonymous, but that you use phone numbers as nodes? Do you mean some sort of ID number representing phone numbers, or actual phone numbers? I ask because I wouldn't consider phone numbers anonymous.


They could easily SHA1 the phone numbers to "Anonymize" them.


The input space is too small for SHA1 to effectively anonymize. The NANP, for example, has less than 10^9 possible numbers; it would be a very simple task to create a rainbow table mapping every possible phone number to its corresponding SHA1 hash.

For the same reason, you can't just use a simple cryptographic hash to "anonymize" data such as birthdates, zip codes, SSNs, or PINs.

Using a key derivation function with a very high cost factor can mitigate this to some extent (e.g. making it take 5 seconds on an average CPU to generate the hash from a phone number), but it by no means makes for secure anonymization; eventually computing power will catch up.

Encrypting the number with a secret key (or using an HMAC), and destroying the key after the anonymization takes place might be a reasonably secure way of doing this, however.


Maybe just salt each number with a random salt?


Yep, we effectively did this. But as my comment alludes to, it doesn't really matter. You have enough to uniquely identify someone.


The argument isn't that meta-data can't be used to get a lot of information about someone. The argument is that in the U.S., meta-data isn't protected information. Call meta-data is not your information, but information the telephone company keeps about you. In the U.S., the 4th amendment does not protect those sorts of records: http://en.wikipedia.org/wiki/Smith_v._Maryland. Your cell phone, which you use voluntarily, gives the phone company tremendous information about you, and under U.S. law nothing keeps the government from getting that information from the phone company.

Does call meta-data give the government a lot of information? Yes. Does it give the government too much information? Quite possibly. But arguing shrilly about how collecting call meta-data is "illegal" is counter-productive. Maybe it should be illegal, but you can't start the process of making it so by proceeding from an incorrect premise. And you can't dismiss the goal of making it illegal, by arguing that the government is already ignoring the law, with reference to activity where the government is clearly attempting to stay within the law, even if it is pushing the boundaries as much as it can.


This mixes up two different points.

One is what the general public thinks about the importance of metadata; the OP shows that it is a bit more than some may think, so arguing about the legal aspects is a bit beside the point.

That said, if you want to argue the legality, it isn't that clear-cut, either. The problem is that while Smith v. Maryland may resolve the question of the constitutionality of the collection, it does not answer the question of its legality. In addition to a search or seizure being constitutional, there generally also needs to be a statute authorizing the search or seizure.

Unfortunately, whether there is such a statute is highly dubious. The leaked court order used an extremely suspect interpretation of section 215 of the Patriot Act to justify seizure of the phone records; which is what Senators Ron Wyden and Mark Udall kept pointing out. Seizure under the electronic surveillance provisions of 50 USC §1801 etc. does not solve the problem, either, because 50 USC §1801(n) defines "contents" for the purpose of electronic surveillance to include metadata. And seizure under the Pen Register Act would require the government to certify that the information obtained is likely to be relevant for an ongoing criminal investigation.

This means that while Congress could have authorized the NSA to collect the connection data in such a fashion without such a law being unconstitutional, it is at the very least questionable whether Congress actually did such a thing.


To put the link into perspective: That article stems from a debate around the German "Vorratsdatenspeicherung", an attempt to put law into place which forces telecommunication service providers to store metadata (of telephone calls and - technically curious - emails) for six months. Law enforcement would then be able to access metadata for such a timespan. Law enforcement could query for all available data before that law was discussed, but the data wasn't necessarily available.

FWIW, the law was put into place and revoked by the German constitutional court, the "Bundesverfassungsgericht".

Of course, German data privacy law works a bit differently, too. Metadata is covered, as long as it points (possibly indirectly) to natural persons. As long as the data isn't needed for any purpose covered by the business it stems for (as when the telephone bill is over the dispute deadlines), it has to be deleted.

The whole debate is about 5-3 years old here in Germany.


I don't think the argument is whether metadata is "protected."

I think that the argument is that the US is collecting whatever they want even on lawful citizens, not being forthcoming, and arguing its all legal whether it is or isn't.

Also - your notion that something not explicitly "protected" is fair game scares me.


The purpose of the web-page is to illustrate how much information you can get about a person with just meta-data.

The law should be a reflection of our morals, not the other way around. That would be a recipe for disaster.


Totally agreed. The law should reflect our morals. But we judge the legality of an action by the law, not our morals.

Moreover, the problem for privacy advocates is that the moral debate is even less clear than the legal one. Being able to declare the NSA's actions straight-up unconstitutional under the 4th amendment would avoid the mess of resorting to the democratic process to determine what the people, as a whole, really thought about surveillance.


under U.S. law nothing keeps the government from getting that information from the phone company.

To the extent that you can actually differentiate between them. Which was hard enough when ATT ruled.


"Metadata doesn't matter" to me seems to be a really poor strawman. Maybe a small minority of people think that, but I'm pretty sure most people are smart enough to realize that if it "didn't matter" the NSA wouldn't be collecting it to begin with.

Also, I don't believe that it has been shown that location information has been collected. That claim is conjecture only. We've seen a lot of conjecture related to these leaks that has been taken for fact. Sometimes it is hard to tell them apart.


I call it "just the tip" fallacy.


And that's just from the phone metadata. Imagine how much more they can do with all your online info from all the services you're using, all the blogs you're commenting on, and so on.

The same person being talked about above wrote this article in NYTimes yesterday:

http://www.nytimes.com/2013/06/30/opinion/sunday/germans-lov...


Thanks, that is a great post and well worth reading! Especially the part that describes possible consequences of trading privacy for security (Nazis, Communists).


What a remarkable visualisation - this is a clear demonstration of just how intrusive these metadata records can be. If they're not controlled by law, they should be.


Malte Spitz (the guy who's data you see) is a German Green Party politician and did a TED presentation in 2012 http://www.ted.com/talks/malte_spitz_your_phone_company_is_w...


I would encourage anybody who haven't watched this to do so. It's a very interesting video, especially for younger people who didn't grow up during that time period.


Let's not forget that combined metadata from millions of people allow much greater detail than this (who you meet, talk to regularly, share interests with, are likely to run into ...).


I'm afraid the actual definition of "meta-data" is up to interpretation in the context of IP communication.

What if the NSA considers not only IP source & destination as "metadata" but also anything down to the application layer that is not strictly content? Like the HTTP GET line or HTTP headers.


I think you can take that as a given. If you look at the GCHQ leak - they're basically just recording everything (including content) for 3 days, and keeping headers for 30 (shared with the NSA of course). That would give them most websites visited by an IP (which would take hardly any space to store, but are still really intrusive).

The only things preventing this from being a total capture of all information (to be sifted through later) are technical issues with storage, not moral or legal ones.


What do you thing that graph databases with trillions of connections are used for? The real fun will start after someone leaks couple of terabytes of tracking data.


Of course it matters, otherwise they wouldn't collect it.


Exactly.


Well, if location data is considered part of this "metadata", then I don't see how anyone could argue against the dangers of this.

My physical location in the real world I consider way more private in matters of wide scale tracking than what I write or say.

For instance, I hardly ever let my browser determine my location and send it to some site, it's none of their business where I am, and if I want the local weather they can get the name of the city I'm at.

But I was hoping this article would be about another, way more dangerous, because way more information-rich type of "metadata": Social graphs and contact lists. The problem with this is, humans underestimate the depth of this kind of data because we're not really well-equipped to reason about them.

If you have a table that consists of (time, location) records, it's pretty easy to envision what sort of information could be extracted from this data. Add a few more fields, and it becomes harder, maybe you need some creativity and statistics, but it's all basic detective work.

A free form directed graph (such as a social graph or collection of contact lists) doesn't look like a table at all (well, you can represent it as a table, but that won't make you much wiser). It's in fact a very high-dimensional object.

The older generation out here, may remember when they first encountered the WWW, when you could only navigate it by clicking links. I got this sense of vastness, perhaps even helplessness. They don't call it hypertext for nothing. The sense of vastness comes because clicking and navigating those links gives an idea of moving through a space. Except this space is in some sense "larger" than our usual 3D space. Every door (link) can open into every room, regardless of whether it would be possible in a physical space.

This is why those "graph of (part of) the Internet" pictures you sometimes see are generally always a tangled clutter of strings, usually vaguely ball-shaped. This is because there is no sensible representation of this type of inter-connected data. You can't make a hierarchy or a map, at least, not in the general case (and the thing you want to reason about is the general case, most of those graphs are exponential small-world graphs, highly inter-connected).

Same thing for social / contact list graphs. Except they usually don't have web-rings or directories (you can sometimes make them like FB does, but they aren't generally available, again the general case).

So okay we're not really good at keeping large graph networks of "friends of friends of friends" and other relationships in our heads and reason about them. We're really not. What you think you can reason about those graphs is just scratching the surface.

Computers, however, and Big Data Machine Learning algorithms in particular, have no problems at all with this type of data. An algorithm never lived in a 3D space, it doesn't care if a dataset makes no sense as a physical configuration of nodes, in order to navigate it and extract information from it.

Another important distinction is, people tend to think of these social graphs as labeled nodes with edges between them. Which is correct, in a sense. But it gives the impression that the labels are more important than they actually are. This may sound weird, in the building/room analogy, if you have millions of rooms, and every room is directly connected to 50-200 other rooms, somehow the shape of the paths between the nodes and way they are connected becomes a vastly more information-rich data source than the actual values of the labels of the nodes themselves.

They don't need your name or your photo, the local shape of your social graph is a highly unique fingerprint of whoever you are.

And you can delete Facebook, but on the next social network you sign up for (or any of the other social graphs you're generating, email/IM contact lists, etc), this fingerprint will echo, and in many cases be similar enough to clearly indicate this is the exact same person. No names necessary. (this may be a bit harder if you have a strictly separate business persona and social persona, but there are still some unexpected artifacts to pick up for a ML algo even in these cases) If you're not on a network at all, your presence can be extrapolated from the "hole" in the graph you left (all your friends are there, with their particular local graph shapes, but one node is missing), that is even if you have nothing to hide, you will be leaking info about those who do.


Thanks for this. It is a highly informative comment, especially regarding big data algos.

Extending the 'hole' analogy, do you think the watchers / algorithms could complete reasonable extrapolation on you if your group of closest acquaintances all decided to disappear from the network?

Perhaps even this more extreme measure would be fruitless as each of your friends has a fingerprint that they 'remove' from their respective unique graphs. Your group's disappearance would be a larger void, but each member's tendrils would carve out unique telltale gaps.


> Well, if location data is considered part of this "metadata", then I don't see how anyone could argue against the dangers of this

I remember a "scandal" that occurred in my country's Parliament in the early 2000s (2002 or 2003), when one of the local mobile carriers decided to display the GSM cell towers' names on the mobile phones' small screens (close to the "battery still left" icon). Some of the MPs thought that as being way too obtrusive, but nobody cared because they're seen as being corrupt by definition, the mobile company ended up by not displaying the info anymore (but still collecting it, of course) and everything was fine.

There was of course that other thing that happened to the same company (one of the 3 largest global brands in the industry) a couple of years later, with one of the mobile company's office people (a lady) being jealous on her boyfriend and asking some guys "in the IT department" if there wasn't a way for them to check said boyfriend's messages and calls, all this "as a small favor from colleague to colleague", which of course there was a way to do that. I can't remember if the boyfriend was cheating or not.


1. To communicate, Paula Broadwell and David Petraeus shared an anonymous email account

2. Instead of sending emails, both would login to the account, edit and save drafts

3. Broadwell logged in from various hotels' public Wi-Fi, leaving a trail of metadata that included times and locations

4. The FBI crossed-referenced hotel guests with login times and locations leading to the identification of Broadwell

http://www.guardian.co.uk/technology/interactive/2013/jun/12...


Didn't the 9/11 hijackers use this same technique (sharing an email account and communicating via drafts)? It sounds very familiar.


If you heard about it, you can bet he was. Nevertheless, power and abuses go hand-in-hand. I don't know what it is about human nature that causes us to give those in power the benefit of the doubt. Hell, in America at least, people knew 250 years ago that power begot abuse, and wrote "release valves" into the constitution to prevent that abuse from becoming overwhelming. I wonder why they didn't think people would become overly apathetic in the meantime.


> "I wonder why they didn't think people would become overly apathetic in the meantime."

The Founding Fathers were worried about this; they just didn't know of any systematic way to prevent it. I'm not sure there is one.


Did they ever write any essays or letters on why they didn't make voting compulsory? Was it a feeling that such compulsion impinged on freedoms, or that it wouldn't help fix the problem of apathy? Or did they just think it would be absurd if people voluntarily turned down their chance to pick their representation in government?


Given the remarkable intel that can be gathered, I'm surprised the NSA/CIA/FBI aren't giving away smartphones to targets as anonymous presents or under the pretense of winning a contest.


Perhaps that's what their plan is with the so-called "Obama-phones".


Why, when you can tap into the trunk lines and call databases?


Who says they aren't :)


New NSA agent position: cellphone vendor!


Eventually, all the social and location graphs will be mapped for all of humankind - and we shall find out that everyone, on the whole planet, is exactly 42 feet from Kevin Bacon.


If some agency like NSA etc wants to know about you in great detail, clearly they have the data, and will be able to very quickly put it all together.

The other side of this coin is that commercial parties like Facebook etc have the same potential detail and insight about anyone.

There is also very high probability that similar data is being put together by entities somewhere between the NSA and Facebook, for purposes that are much more starkly not in your best interests eg fraud.

Bottom line: anyone is an open book on the internet.


Does anyone know if these work as advertised? http://www.ebay.com/sch/items/?_nkw=cell+phone+signal+block&...

I rarely receive calls on my mobile - and only really carry one just in case I need to make a call.


How would they not? They're little Faraday cages, and as far as I've heard even Apple has not found a way to violate the laws of electromagnetics ;)


Why don't you just switch off your phone? That would save precious battery time, too…


I know this may sound ultra-paranoid, but I have heard rumors that switching it off may not be enough.

--Edit-- Thanks for all the replies - So does the faraday cage accomplish the same as battery and SIM removal?


Not sure about smartphones, but IIRC, my old cellphone used to ring any alarm that was set even if the phone was completely turned off.


My generic Android phone does this.


Is this common? This is the first I've ever heard of such a feature, ever. Considering that we're talking about the OS being completely shut down, I'm skeptical of this existing in smartphones.


I believe it is normally a separate microcontroller. You generally need something to power up the main phone and to deal with battery charging (thats not usually the main CPU, although my Android phone does display an animated icon on screen when powered off and charging, so unclear whats driving this).

Most computers have a number of extra microcontrollers. You would have to do a teardown to see how they might be wired up.


My nokia smartphone does it too.


It is not ultra-paranoid. The government's own rules regarding cell-phones in certain highly classified areas (SCIFs) require that they be left outside precisely because you can't guarantee that off means off.


I think that has less to do with wireless transmission as it does to recording devices.


Yup. For the ultra concerned, its battery and sim removal.


I've also heard they can be turned on by the tower, even if the phone is off. The only way to be sure is to keep the battery removed.


Slight off-topic question: I want to collect my own metadata at this level (for just calls and SMS)?

From what I can tell I need to collect:

- List of all incoming and outgoing calls and SMS

- Get my location data and match them to the timestamp (?) of the calls and SMS's

- Display this on a map.

Any suggestions on how to do this?


People might think that (apart from GPS) signals to one tower only are unlocalizable. Add the variable of signal strength (with fairly uniform xmit pwr) to that single vector and it gets more interesting.


Just me or, anyone else just throw up a little bit.

Almost overwhelming.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: