You can infer some amazing things from simple metadata. I spent six months in an...

DanBC · on July 1, 2013

INterestingly what you describe is probably not legal under EU privacy laws. People are horrified by NSA just collecting this data. And yet you calmly describe this process.

Your opinions are not given in your post - you're not saying whether it's good or bad to do this - but it's clear that the company you worked for didn't see doing this as evil.

I find it fascinating that this kind of data mining has been going on for years and that opposition has been so quiet.

(Please, this post is not any judgement about you!)

drpancake · on July 1, 2013

All the telcos collect this data as far as I know. They're allowed to for the purposes of improving and maintaining their network. A few crunch it for marketing purposes but this has to be opt-in (not that customers would have any idea what that might entail, even if the privacy policy describes it broadly). I can't comment on the legality of the project I worked on, but I assume it was checked out by legal counsel.

I personally wouldn't want my data mined in this way. I don't retain any brand loyalty, lets put it that way.

adamcanady · on July 1, 2013

Does the EU actually have laws against collecting this data without opt-in for marketing?

On a related note: it would be really interesting to see privacy laws visualized around the world.

RobAley · on July 1, 2013

It may well be legal, if part of the stated purpose of collecting the data, as agreed with the customer in the T's and C's that they thoroughly read through and agreed to, was to collect data for research, network development, service development and other wooley terms that cover this kind of R&D.

What many companies do is anonymise the data, remove the actual phone numbers /account details and replace with dummy numbers. While not ideal (backwards matching is possible due to the clues the data "gives up"), its probably safer than it sounds.

noselasd · on July 1, 2013

There's also the matter of who's doing it. It is, imo, one thing that the company whose services I use collect data on my use of their equipment - for gathering network performance data, troubleshooting (and billing info) that they are using themselves and not handing over to other parties.

It's another thing to have government agencies snooping in such data for entirely different purposes.

endersshadow · on July 1, 2013

I think his point was that even anonymized data isn't as anonymous as you think.

bane · on July 1, 2013

It's basically what Google Now uses to figure out your home and work locations and prepare directions and traffic for you.

_pmf_ · on July 1, 2013

> INterestingly what you describe is probably not legal under EU privacy laws

Not only is it legal, it's essential for telcos to expand or enhance their cell coverage.

ronaldx · on July 1, 2013

If anonymisation is reversible (it seems clear that it is reversible, if it was anonymised at all), the data likely again falls under the Data Protection Directive in the EU.

There, personal data legally requires explicit permission for each specific purpose it's used for and cannot be stored any longer than is necessary for that purpose.

https://en.wikipedia.org/wiki/Data_Protection_Directive#Prin...

lifeisstillgood · on July 1, 2013

Improving cell coverage / planning for growth is a specific purpose. You can then argue how long to keep the metadata for - any reasonable argument starts in years.

I think we need to accept that metadata and all digital comms is communications in public. And that we need social conventions backed by law to make certain things politely not read unless a warrent is served.

ronaldx · on July 1, 2013

>Improving cell coverage / planning for growth is a specific purpose.

Agree, so you just need contractual permission (not hard to get). You can't decide later that you want to use it for some other apparently-innocuous reason.

>any reasonable argument starts in years.

Cell coverage data from years ago is relevant to today's growth? Although that might be enough to get you out of a legal hole, I find that highly dubious.

The law is there, but it's not understood, not clear enough, nor enforceable enough for commerce to fall in line.

sfall · on July 1, 2013

what your company doesn't make year over year comparisons? Many telco's at least in the US don't own every tower they rent and if they can compare that we aren't utilizing this tower is this a downward trend? Should we not renew our contract for this cell tower location. Then there is just the planning aspects of anticipating heavy use patterns for major events, concerts, festivals etc. you can't compare how your system is handling the added demand as an even grows if you don't have data

pc86 · on July 1, 2013

Presumably one would need to review population growth, movement, etc trends.

e12e · on July 1, 2013

This is exactly what the NSA have been doing, according to an earlier whistle blower, see eg my submission to HN here with his keynote from Hope 9 (2012):

https://news.ycombinator.com/item?id=5964403

edit: To save a click: http://www.youtube.com/watch?v=dxnp2Sz59p8 [NSA whistleblower William Binney Keynote at HOPE 9 (2012) [video]]

mjn · on July 2, 2013

I went to a presentation by DONG Energy [1] where they were discussing how, with their in-progress upgrade to remote-reporting, per-residence electric meters, they can soon infer all sorts of things about people's apartments and daily habits from the distinctive patterns of electric usage. Not just aggregate usage like inferring when someone's awake or asleep, but in much more detail based on the distinctive patterns different devices make.

They do seem sensitive to privacy fears (perhaps partly because the regulatory climate forces them to be), but the level of detail they were able to get out of the electric data in a prototype system was quite eye-opening. They had some ideas about using it for consumer self-education, e.g. feeding it back into a small display near the meter that would make energy-saving suggestions. But even that could get creepy, because it could make suggestions about specific devices you owned, when you never told the energy company that you owned them!

[1] An unfortunate acronym, from Danish Oil and Natural Gas

mcintyre1994 · on July 1, 2013

Could you just clarify something? You state this data is anonymous, but that you use phone numbers as nodes? Do you mean some sort of ID number representing phone numbers, or actual phone numbers? I ask because I wouldn't consider phone numbers anonymous.

kenrikm · on July 1, 2013

They could easily SHA1 the phone numbers to "Anonymize" them.

trapexit · on July 1, 2013

The input space is too small for SHA1 to effectively anonymize. The NANP, for example, has less than 10^9 possible numbers; it would be a very simple task to create a rainbow table mapping every possible phone number to its corresponding SHA1 hash.

For the same reason, you can't just use a simple cryptographic hash to "anonymize" data such as birthdates, zip codes, SSNs, or PINs.

Using a key derivation function with a very high cost factor can mitigate this to some extent (e.g. making it take 5 seconds on an average CPU to generate the hash from a phone number), but it by no means makes for secure anonymization; eventually computing power will catch up.

Encrypting the number with a secret key (or using an HMAC), and destroying the key after the anonymization takes place might be a reasonably secure way of doing this, however.

anonymous · on July 2, 2013

Maybe just salt each number with a random salt?

drpancake · on July 1, 2013

Yep, we effectively did this. But as my comment alludes to, it doesn't really matter. You have enough to uniquely identify someone.