Hacker News new | past | comments | ask | show | jobs | submit login
Web Browsers Leave 'Fingerprints' Behind as You Surf the Net (eff.org)
64 points by stanleydrew on May 17, 2010 | hide | past | favorite | 28 comments



I played around with this for work and fonts appear to be the biggest culprit of selling you out. User Agents can be somewhat unique depending on what you've installed that may have modified them, but fonts are far worse. From what I gather, the flash app returns the fonts in installed ordered and not alphabetical. So unless your system is factory fresh, it may still be distinguishable from someone else with the exact same setup, provided you installed any fonts (or apps w/included fonts) in a different order.


For me it is plugins. I only have a couple installed in chrome, but apparently there are also system-wide plugins that it is picking up info on. 19.68+ bits of info based on plugins alone.


One thing I don't get about this; I've been trying it with completely default installs of Fedora core 11 running Firefox and it calls them all unique.

I feel like I am missing something obvious - anyone care to point me in the right direction (I read most of the links of the project site)?


It's the bits of entropy. You're unique within 2^bits users on the web as a whole (according to their model). Said another way, according to how they're measuring uniqueness, that you get N bits of entropy.

On a hypothetical website that had 2^N different users, you could be uniquely identified. Specifically, you'd be the only FC11/ffox user in the mix :-)


You misunderstand. Identical default installs still report as unique. I couldn't understand what they tested that could differetiate them (because by rights they all should report identically - or at the very least some of them should).

This was tested with 25 identical machines I was commissioning for a task.


Well, it reports how it determined it was unique (i.e. the browser characteristics), so perhaps you can look at that.

When I visited the page 10 minutes ago, it said I was unique. I cleared everything and then went back to the page, and then it was "one in" 400k something visitors has the same as you. Did it again, and I was one in 200k something visitors. So, at least in my case, it at least seems to realize that the three visits had the same fingerprint.


There may be some non-determinism in the order of initializing plugins, or fonts. This could result in a unique order when they are polled.


Aha. You've nailed it I think.. partly anyway. It looks like the plugins do come up in different orders...


Without GUIDs, no machine is absolutely unique. The bits of entropy are relative measures.


I am guessing that the fingerprinting problem extends far beyond the browser. What about connection based fingerprinting? (TCP/IP stack, etc.) What about order of fetching images from a website? What about timing attacks that measure the time taken to load a page (browser caches, cache sizes, eviction policies, upstream squid caches, etc.)? The whole system is way too complicated.

And I feel that having additional extensions/plugins to combat this, unless deployed universally, is still going to contribute to the entropy and help increase the chances of you getting fingerprinted.

Edit: the pdf on the website talks about it. https://panopticlick.eff.org/browser-uniqueness.pdf

"The Curse of Dimensionality."


I'm unique alright, and it appears the main culprit is the identification of my System Fonts/User_Agent.


Same here. For me the fonts are even completly unique (the "one in x browsers have this value" column is >840'000). Yet I'm using a fairly standard install (I can't remember installing any fonts). Maybe some programs install a font automatically, making you uniquely identifiable quite easily.

But I still wonder how fast the birthday paradox/problem would hit such an identification system. (http://en.wikipedia.org/wiki/Birthday_problem)


I am too, and I have a default installation. How did fonts get so unique?


It's amazing how quickly you can single out people using supposedly anonymized datasets like this. There's a CMU paper noting that 87% of Americans are uniquely identified by their birthdate, 5-digit zip code, and gender: http://arstechnica.com/tech-policy/news/2009/09/your-secrets...

Statistics are neat.


Time for some http://modifyheaders.mozdev.org/

edit: panopticlick gets plugin and font list via javascript; modifyheaders can't do anything about that ...


I believe it actually gets the font lists using a tiny flash app. Check out this project for an example:

http://github.com/gabriel/font-detect-js


So the browser is not really the culprit--it's flash.


Unfortunately, it's not that simple. This specific implementation happens to use flash because it's fast and easy, but there is also a javascript/css based way of doing it:

http://www.lalit.org/lab/javascript-css-font-detect

It looks incredibly delicate, but it seems to work.

Also, I blocked their flash script and reloaded the page; My fingerprint changed, but I was still a unique butterfly. An absence of fonts is also distinguishing characteristic.


On Windows 7 it says I don't have Times/Times New Roman/Verona (via Firefox 3.6.3)...


I don't have Flash installed and it claimed to be getting the fonts via Java instead.


This is something that I hope the Chrome/Chromium team will do something about. They've been moving fast on a lot of things lately, and if they take the lead others will eventually follow...


I'm not sure why they would. This information seems extremely relevant to Google's line of business.


As far as I know Google tracks with cookies in the standard way. I would be pretty shocked if they were tracking based on anything like fonts or plugins. Is there already evidence to the contrary though?


While I'm not aware of anything it certainly stands to reason that they would, especially with their DoubleClick network.

If the EFF with their limited resources figured this out, Google likely would have, too.


Google and Doubleclick just give you a cookie, though. They don't need to be nefarious.


Google could not feasibly track their user-base this way. The load time for a Flash application that returns font lists (though short) is much longer than they, or their users would tolerate.


Not to mention totally unnecessary. Most people are just fine with cookies.


What is the reason for plugins being sent in the headers? If it's only for the 'you must have flash 10 installed' messages than I'd like to disable that.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: