The two statements are consistent. In order to do the analysis they need to keep...

zzleeper · on Jan 8, 2013

Maybe they can delete the first 100m of every trip, so it would be harder to poinpoint locations w/out losing that much

adrianN · on Jan 9, 2013

"On the Anonymity of Home/Work Location Pairs"

Abstract. Many applications benefit from user location data, but lo- cation data raises privacy concerns. Anonymization can protect privacy, but identities can sometimes be inferred from supposedly anonymous data. This paper studies a new attack on the anonymity of location data. We show that if the approximate locations of an individual’s home and workplace can both be deduced from a location trace, then the median size of the individual’s anonymity set in the U.S. working population is 1, 21 and 34,980, for locations known at the granularity of a census block, census track and county respectively. The location data of people who live and work in different regions can be re-identified even more easily. Our results show that the threat of re-identification for location data is much greater when the individual’s home and work locations can both be deduced from the data. To preserve anonymity, we offer guidance for obfuscating location traces before they are disclosed.

http://xenon.stanford.edu/~pgolle/papers/commute.pdf

josephlord · on Jan 9, 2013

100m radius of two end points on a simple trip would narrow it down a lot and would in many cases allow you to identify an individual.

If additional journeys are also linked to the same phone identifying individuals can get even easier.

mrb · on Jan 8, 2013

Then it sounds like they have data tracking drivers even off the main roads: homes, driveways, small residential streets, parking lots, office buildings. IOW it sounds like they could have done a better job at anonymizing it by truncating driver paths that are off the main roads and highways to only keep data relevant to their study.

alttab · on Jan 9, 2013

How do you know what data is relevant? Ey did the right thing by not sharing the data.

mrb · on Jan 9, 2013

They study high traffic density, so by definition, irrelevant data is data where traffic is under a certain density, which would automatically exclude private areas (homes, driveways, etc).

mrb · on Jan 9, 2013

Ah, unexplained downvotes on what I believe is a reasonable point I make...

At the very least, if data cannot be made anonymous, and can so easily be associated to persons, then this is an argument that they should have never collected it without my consent in the first place. This would be an invasion of my privacy.