I'm the (co)author of the project. Please note that it's just a one-day proof of concept, imagine what a well motivated corporation could do.
It's nothing new (someone correctly pointed the EFF project) but I wanted to make a real world demo out of it.
The demo doesn't store each bits of info separately, it simply creates an hash out of them. If I stored the data separately I could for example identify small user agent updates or screen resolution changes or newly installed plugins and so on.
Also, many of the info can be gathered without JS and actually browsing with NoJS puts you in a very restricted niche making you even more trackable ;)
The demo is far from perfect but I believe that even a 90% reliability is alarming. Anyway I'll put all the source code on Github for you to review. I hope to be able to add the NoScript code as well soon.
Just IP comparison only works if the user's IP is constant, which rules out many phones, tablets, and laptops that frequent coffee shops and restaurant wifi. It also doesn't allow you to tell when IP $a and IP $b are really the same person at home and work. Further, IP comparison doesn't let you distinguish between multiple users coming from the same home or office network, or from a proxy.
Commercial implementations of this sort of tracking include the user's IP in their dataset, but track a number of other datapoints so they can tell if e.g. everything but a user's IP matches an entry in their database, it's probably the same person connecting from a different location.
You can view source to see what they're using to generate the fingerprint: screenSize, devicePixelRatio, timezone, mimeTypes, plugins, httpAcceptHeaders, fonts. It's interesting that these are enough to generate a moderately unique fingerprint. I'm sure my fonts list is unique, so that's probably enough to ID me right there. However, not every computer I use has the same fonts installed, nor the same screen dimensions. This won't track me as I go from my desktop to mobile device.
Professional authentication managers such as RSA Adaptive Authentication can gather 40 or 50 data points from which to tell if a user is somewhat the same or not. They apply a ratio to the value generated by which a user can be redirected to a challenge question. It's not foolproof but it prevents a lot of automated phishing or botnet scams from being able to automatically log in with your credentials.
Yes, I tried Panopticlick on a number of different computers and it was always the fonts that gave them the most bits of entropy. I wonder if installing a couple of new fonts and deleting a few others (which I do from time to time) would make them believe I'm a different person...
But really, browsers should stop allowing scripts to access the full list of available fonts. What use does a website have for that data, anyway? Any site that doesn't want to use one of the standard fonts should be using webfonts nowadays.
I would say how unique the fingerprint really is is actually an important issue, I wonder how much traffic / time does it take before collisions start to occur. In the current setting it would probably suffice if the fingerprint generated one of just 100 or 500 values, the traffic is probably rather low and you visit the page for maybe few minutes and you probably won't go back to it in 3 days or even in 1 hour just to check whether some other guy didn't overwrite your secret word.
Regardless, it's a very interesting idea, and also picturing how difficult and counter-intuitive security can be if you do not study such issues, as an API designer I would surely have a hard time foreseeing that exposing the screen size or fonts list can turn out to be a security issues for the users.
That said, they can probably eventually start correlating the different fingerprints using other data, like device id, location patterns, etc. It would not be impossible to build a dossier of all your browsers and devices, especially if you ever log in to any online service from multiple machines.
Simply resizing my browser window before pasting the second url seems to thwart this (But I don't have flash installed). Without flash, it falls firmly into the "kinda-works sometimes if everything goes perfect" camp.
Resizing my browser didn't go it but moving it to another screen it changed from "1ccf9e9301db4fb87b1d178d77edad5bfa598057" to "ab0e6beb449408b28473dd66a6f4501528087c0e". I don't think this method is prefect at all or should be used for anything reliable(like logins).
Enabling click-to-play in Chrome (and not one of those extensions that hide or remove from the DOM the element once it's loaded) just makes the fingerprint to have less bits, since it can't get the list of the installed fonts on your computer.
That's a very low value. With JS off or blocked, my browsers convey between 16 and 21 bits (the latter meaning unicity in their dataset).
The amount of information conveyed by the HTTP_ACCEPT headers is especially preoccupying. There is nothing in there, apart from maybe the language, that should leak any info, on a modern browser. And certainly not 10 or 16 bits of info.
"Your browser fingerprint appears to be unique among the 2,155,876 tested so far."
Huh, that's awesome ... in a bad way. According to that page , both my system fonts and browser plugin details are unique among the browsers they've tested thus far.
Not necessarily, they fingerprint you using the info the browser gives to them. This site for example uses JS to do the fingerprinting, but it could be just as easily (perhaps less scalable) to do the fingerprinting serverside.
It seems like the biggest information leak is installed fonts. If I install one extra font above the default for my OS, I am leaking a good deal of information:
https://panopticlick.eff.org/
Would it not make sense for my OS to sandbox which fonts can be accessed by my browser? If a webpage wants to use a special font-family, I could be prompted to allow/block access to my greater font library.
Does anyone have any idea how well this works with corporate enviroments where the typical workstation is a clone of all the others behind the same (NAT'es) address?
Underpants could be a synonym for underwear, in which case it could make sense: you'd like to keep you fingerprint private on the web, and your underwear private in real life.
Yeah, but at what cost? I already have an unfavorable view of it - just based on the name (that's human nature) - going into it. In this case, the project isn't really all that clear on what it does - I need to do a bunch of stuff to see it's benefit. I'm busy, I've lost interest. See the problem?
That's just my point of view, of course, but how many others will share it? I'd wager a good number would.
You could accomplish the same thing using local storage in an iframe with postmessage and it'd be a lot more robust with fairly significant browser support. (IE8+)
I built a demo a year ago that let you store personal data and exposed a postmessage API for storing and sharing permissions and personal data with sites as kind of the beginnings of a poor man's client-side only Oauth.
After typing "meow" and hitting enter. The copy paster url had this to say about me "It seems you didn't save the word. Go to lab.cubiq.org/underpants first."
The unique fingerprint is also different. "93615388f7f54cd79d2f806ac3795c182217aa9b" somehow became "f37ec3fdd05c27c13cbb7fcdef95cc004297f62d" after copy-pasting.
Other than that technical glitch for me (Linux, Chrome latest unstable version), I still think this is actually a pretty good idea. But will websites use it now that the ones we actually want to worry about are injected into every website via Tweet and Like and + and whatever buttons.
Google in particular is everywhere with their gAnalytics tracking code.
edit: now that I think about it, I may have misunderstood the point. Was it a proof of concept of providing cross-site tracking without tying to a personal identity?
If not, insecurities in cross-site whatever hardly matter when I am logged into every little tidbit that is loaded via iframe and appears on almost every website. Even porn sites have like buttons these days.
You have to remember that it's not just about the physical tracking, it's also about time. If the page takes seconds to run it's not feasible on billions of requests because the tracking software sits in between the page getting paid and the target destination. If your tracking script takes seconds (this technique only works after the page is loaded), or provides a white screen jump period, it isn't something that will be desired. Unless of course all other options are removed.
Getting all pages on the web to remove the old image based cookie tracking in support of JS etc... also will never happen unless extreme circumstances occur. Most of the people running ad sites have no idea what you are talking about anyway in realms outside of Cookie and Tracking.
> This technique can be used to find out some of the softwares installed on your system. For example I can say that you probably don't have Adobe Creative Suite installed.
Sorry, I have CS3 installed and fully licensed. But still a nice demo.
This fingerprinting can be defeated by NoScript. Or by turning off JavaScript. A sensible idea would be to make an extension that purges the unique elements from the set they're tracking (i.e. fonts, plugins, mimeTypes, screen size and pixel ratio, etc.) and provide a white-list for sites you want to have that information.
Fingerprinting can be very useful to you as a user, as well. Imagine that you switch between devices and work locations all the time and you use a core suite of web applications. Fingerprinting could be used, with your permission, to uniquely identify you across all your devices, locations, and browsers for the suite of services that you depend upon. Not having to ever enter a password again while remaining secure sounds pretty nice to me.
It is not the technology, but the evil application of said technology that is evil.
Today I learned that surfing with different firefox profiles out of privacy concerns is only useful if you disable Flash. Or any other plugin, for that matter.
In 2008 I worked at a major credit card company and they were building the exact same thing, only with more like 75 attributes. Of course it was all through a 3rd party so they wouldn't have any PII, but it was their design. They'd to this to build super-cookies and then track prospects across multiple products. It was awful.
It depends on what they are looking for. For example, a company could gather information about your browsing habits using advertising. If a company like Google distributes ads across a wide variety of sites, they can use your fingerprint to gather lots of information about your browsing habits.
This is scary not because it enables them to provide more relevant ads, but because it enables them to sell personal information to organizations like corporations and the government. Imagine your boss being able to buy a package that tells him what type of porn you like, how often you view porn, your most visited subreddits, etc.
And often from this information, these companies (based on aggregate data) can make more sweeping generalizations (that are often incorrect, but also often right on the mark) like income bracket, ethnicity, drug use habits, sexual orientation, etc. These approximations can also be bought.
Imagine that your 'package' has some information in it deducing that you regularly use cocaine. Even if this is not true and the person observing this information knows it might not be true, the fact that it has been stated might be enough to lose you some important opportunity.
I don't know how common it is for someone to buy this information, but I know that the information is already out there and the potential for things like this is very large.
Can anyone explain what use-case this technique enables that is not served by cookie tracking?
Or is the point that disabling cookies is not sufficient to avoid being tracked? Everyone has cookies enabled (or many sites don't work), so if that's all it is, nbd..
I would like to remind you that when you open incognito, it specifically says:
"Going incognito doesn't affect the behavior of other people, servers, or software. Be wary of:
- Websites that collect or share information about you"
Incognito only prevents information from being stored on your computer, not anyone else's.
"Your word has been saved! Now point your browser to any of the following addresses (copy-n-paste) and watch the magic."
Followed by, "It seems you didn't save the word." on the two connecting websites.
Safari 5.1.5, OS X 10.6.8, no add-ons installed. I do, however, have Flash configured to not allow any websites access to local storage unless I specifically say so.
It's nothing new (someone correctly pointed the EFF project) but I wanted to make a real world demo out of it.
The demo doesn't store each bits of info separately, it simply creates an hash out of them. If I stored the data separately I could for example identify small user agent updates or screen resolution changes or newly installed plugins and so on.
Also, many of the info can be gathered without JS and actually browsing with NoJS puts you in a very restricted niche making you even more trackable ;)
The demo is far from perfect but I believe that even a 90% reliability is alarming. Anyway I'll put all the source code on Github for you to review. I hope to be able to add the NoScript code as well soon.