Hacker News new | past | comments | ask | show | jobs | submit login

The two statements are consistent. In order to do the analysis they need to keep track of the full paths that the individual drivers make. Given the paths, particularly over a period of time, it is easy to identify individuals, even if you dont have their names to begin with.



Maybe they can delete the first 100m of every trip, so it would be harder to poinpoint locations w/out losing that much


"On the Anonymity of Home/Work Location Pairs"

Abstract. Many applications benefit from user location data, but lo- cation data raises privacy concerns. Anonymization can protect privacy, but identities can sometimes be inferred from supposedly anonymous data. This paper studies a new attack on the anonymity of location data. We show that if the approximate locations of an individual’s home and workplace can both be deduced from a location trace, then the median size of the individual’s anonymity set in the U.S. working population is 1, 21 and 34,980, for locations known at the granularity of a census block, census track and county respectively. The location data of people who live and work in different regions can be re-identified even more easily. Our results show that the threat of re-identification for location data is much greater when the individual’s home and work locations can both be deduced from the data. To preserve anonymity, we offer guidance for obfuscating location traces before they are disclosed.

http://xenon.stanford.edu/~pgolle/papers/commute.pdf


100m radius of two end points on a simple trip would narrow it down a lot and would in many cases allow you to identify an individual.

If additional journeys are also linked to the same phone identifying individuals can get even easier.


Then it sounds like they have data tracking drivers even off the main roads: homes, driveways, small residential streets, parking lots, office buildings. IOW it sounds like they could have done a better job at anonymizing it by truncating driver paths that are off the main roads and highways to only keep data relevant to their study.


How do you know what data is relevant? Ey did the right thing by not sharing the data.


They study high traffic density, so by definition, irrelevant data is data where traffic is under a certain density, which would automatically exclude private areas (homes, driveways, etc).


Ah, unexplained downvotes on what I believe is a reasonable point I make...

At the very least, if data cannot be made anonymous, and can so easily be associated to persons, then this is an argument that they should have never collected it without my consent in the first place. This would be an invasion of my privacy.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: