Hacker News new | past | comments | ask | show | jobs | submit login
Crooked Style Sheeding – Webpage tracking using only CSS (github.com/jbtronics)
532 points by ProfDreamer on Jan 16, 2018 | hide | past | favorite | 175 comments



If you're concerned as a user of a malicious site:

* Link click tracking - So what, the site could route you through a server side proxy anyways

* Hover tracking - Can track movements of course, but doesn't really help fingerprinting. This is still annoying though and not an easy fix

* Media query - So what, user agent gives this away mostly anyways

* Font checking - Can help fingerprinting...browsers need to start restricting this list better IMO (not familiar w/ current tech, but would hope we could get it down to OS-specific at the most)

If you're concerned as a site owner that allows third party CSS:

* You should have stopped allowing this a long time ago (good on you, Reddit [0] though things like this weren't one of the stated reasons)

* You have your Content-Security-Policy header set anyways, right?

Really though, is there an extension that has a checkbox that says "no interactive CSS URLs"? I might make one, though still figuring out how I might detect/squash such a thing. EDIT: I figure just blocking url() for content and @font-face.src would be a good compromise not to break all sorts of background images for now.

0 - https://www.reddit.com/r/modnews/comments/66q4is/the_web_red...


> * Media query - So what, user agent gives this away mostly anyways

I was earnestly surprised how much data macOS and Android devices tend to put into the user agent. Not only the exact patch level of the browser, but also the OS patch level and Android devices even tend to broadcast the precise device model as well -- more accurately than just looking at the device!

Some examples:

Mozilla/5.0 (iPad; CPU OS 10_3_3 like Mac OS X) AppleWebKit/603.3.8 (KHTML, like Gecko) Version/10.0 Mobile/14G60 Safari/602.1

Mozilla/5.0 (iPhone; CPU iPhone OS 11_2_1 like Mac OS X) AppleWebKit/604.4.7 (KHTML, like Gecko) iOS/16.0.7.121031 Mobile/15C153 Safari/9537.53

Mozilla/5.0 (Linux; Android 7.0; LG-H840 Build/NRD90U) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.111 Mobile Safari/537.36

A Linux in comparison:

Mozilla/5.0 (X11; Linux x86_64; rv:57.0) Gecko/20100101 Firefox/57.0

While it includes the version number, no patch level (57.0.X) is included.


The user agent is such a mess, why should any website know all that? Why should a website know anything about the visiting guest, they should be using feature detection instead. Lets get rid of the user agent or just put "Mobile/phone", "Desktop" or similar in it. Maybe OS and a short browser name and main version number for statistics.


> why should any website know all that?

As a developer:

Without user agent: How would I easily detect which browser breaks a certain feature on my project?

If I deploy a new feature and see through logging that a browser X is not able to do Y then I can install X on my machine and test and fix it.

If I don't have a user agent then I can just detect that after deploy there are more cases where Y fails but I don't know which browser is responsible for this.


As a developer: If we actually pushed browsers to fix things, you wouldn't need to worry about that. Why should the job fall to you to work around their shitty implimentation of the spec?


Because when management asks you why their site that they paid hundreds of thousands of dollars for doesn't work on <insert major browser here>, your answer can't be "the browser's implementation of the spec is shitty, blame them." Your answer is going to be, "Yeah, sure, let me fix that."


> your answer can't be "the browser's implementation of the spec is shitty, blame them."

If it's a major browser which management cares about, then you should be testing with it already. If you're not, then logging user agent strings isn't going to help.

Logging user agent strings would help if, for example, an unexpectely-large proportion of users are using a "non-major" browser, in which your site is broken.

If the proportion is small, management won't care.

If the proportion is expected, then market/demographic research is partly to blame; update the spec.

If the browser is "major", you should be testing with it anyway.

If the site isn't broken, there's no problem.


I see what you're saying, but unfortunately, especially in enterprise, the browser version is often locked to something quite old. One of our clients has locked to Chrome 48.

Even if Chrome followed the spec to a T, programmers still write bugs. So, I'm not going to expect a browser (at least) 15 versions old to behave perfectly. And we all know that the spec isn't perfectly implemented.

So, no. Unfortunately sometimes there are things that will make management care a lot about a browser that they really shouldn't.


> Unfortunately sometimes there are things that will make management care a lot about a browser that they really shouldn't.

I never said management should or shouldn't care about this or that browser. I never said anything about browsers being new or old.

I said that developers should be testing with whatever browsers management cares about. If management care about it, and there's some justification, then add it to the spec.

> unfortunately, especially in enterprise, the browser version is often locked to something quite old. One of our clients has locked to Chrome 48.

That's an excellent justification for having Chrome 48 compatibility as part of the spec, so you should already be testing your sites with it. What has that got to do with user agent strings?

Is Chrome 48 even old? I tend to ensure IE6 compatibility, unless I have a good reason otherwise (e.g. voice calls over WebRTC, or something). When I'm using w3m, e.g. to read documentation inside Emacs, I occasionally play around with my sites to ensure they still degrade gracefully.


Well, your answer should be: I'll reimpliment it using known, simple, stable technologies. But for some reason our industry hates those things.


So... not the web, then? :-P


Management is not some all-powerful, all-knowing spectre impervious to persuasion. Don't give up so easily.


Don’t forget “Also, it’s going to cost $X more.”


Because 100% of implementations are differently shitty. There's no amount of "pushing browsers to fix things" that is going to catch 100% of novel interactions resulting from different combinations of the declarative HTML and CSS languages out in the wild (especially when JavaScript then comes along and moves all those declarations around anyway).


Sure, and the browsers that stray further off will get used less and die off.

And if you are using the latest and "greatest" JS features, you have to expect the failures that happen. If you enjoy sitting on the bleeding edge, don't complain about getting cut.

If you implement features using known, simple and stable tech, things will generally work great without needing to worry about special cases.


"Yeah sure thing boss. I'll get on the phone to Microsoft and ask them to fix that issue in IE8 that you insist needs to be supported."


So you think a better way is spending your evening trying to fix your square pegs so that they fit in round holes?

Why would you willingly do that to yourself? If we pushed browser developers to actually do their job, they wouldnt be pushing their weight around like they do now.


Yeah why ask to be empowered to fix your own problems when you could just beg someone else to fix them?


How is a browser not rendering correctly not their problem?


Who said it's not their problem? But it's also relying on someone else to fix something that you could fix. If I need to get somewhere it doesn't matter if my car's engine is broken because the company stabbed it with bolts I just need a working car. I can sit around whining about how awful the car company is but it doesn't get shit done fast.

Extreme ownership of problems. It's a really helpful concept. You'll stop trying to blame people all of the time for things that you can control and find solutions for them instead. On top of that, if you can't control it you can let it go as something that you can't fix.


If getting shit done fast is your goal, then you are gonna get burned, and I have very little sympathy for you. We should be focusing in getting shit done solid. If it's such a big deal that something works, why build unstable systems in the first place?

If you need your car to be reliable, don't bolt experimental features onto it, and test it before you need to take it on the road.


Not exactly related to your point about the user agent giving all kinds of arguably unnecessary information, but there's an interesting write-up about why the core user agent is the mess it is for those who've not seen it already.

https://webaim.org/blog/user-agent-string-history/


The user agent string definitely has a place on the web, the problem is that it's been used and abused by web developers in the 90s and 2000s when trying to deal with the utter mess that was "browser compatibility" back then.

I run whatismybrowser.com and it's a perfect case of why user agents are useful information. It'll tell you what browser you've got, what OS, and whether you're up to date or not. It's extremely useful to know this info when helping non-tech users - you would not believe how many people still reply "I just click the internet" when you ask them what browser they're using. My site helps answer all those complicated "first" questions.

I completely agree that using User Agents for feature detection/brower compatibilty is a terrible idea, but apparently enough websites still do it to warrant having to keep all that useless, contraditory mumbo jumbo in it too - it isn't what they should be used for any more!

And also, I don't think there's any problem with including "too much" information in the user agent either - point in case: Firefox used to include the full version number of Firefox in the user agent, but now it only shows the major version number, not the exact revision etc. The problem is I can no longer perfectly warn users if they're actually up to date or not.

The reasoning for this is given as a security concern, which I still don't understand - if there's a security problem in a particular point-revision version of Firefox which can be exploited by a malicious web server - odds are they're just going to try that exploit for any version of firefox and it either will or won't work - how does the malicious site knowing the exact version make the situation any worse?!


Feature detection is equivalent to user agent string, from a fingerprinting perspective.


Google makes the OS, but they also sell ads. Seems pretty advantageous to them to make their devices easy to fingerprint.


Not possible in short term. Many sites freak out when accessed with non-standard useragent.


I've always thought this. Just code to the standard, and if the browser doesn't render it correctly, then tell the user to fuck off and fix their browser.

I don't know why we ever thought sending all this data to the server was a good idea


if 99% of websites you visit work great, and 1 website you visit tells you to fuck off and fix your browser, are you going to do that or are you going to just not use that site?

Remember: incentives. The goal of a web developer is to make sites people use.


Yeah, I would just leave that site. But if you implement that feature using known simple and stable, tech, you wont really have that problem.

> Remember: incentives. The goal of a web developer is to make sites people use.

The goal should be to empower users. Anyone can make a site that people "use"


Does empowering users get the web developer paid?

I mean, in an ideal world, of course it does. But again: incentives. Keep in mind: search engines themselves are extremely empowering, and they are not generally considered to be something a person pays directly for.


Yeah, empowering users does get developers paid. I get paid to do that myself, and know a lot of other people who also get paid to do that. Of course it's sometimes easier to get paid by treating users like cattle. But if someone doesn't intuitively understand why screwing their users is a bad idea, I'm not sure I can help them.


Firefox on the other hand, discloses very little, even on mobile:

    Mozilla/5.0 (Android 4.4; Mobile; rv:41.0) Gecko/41.0 Firefox/41.0


I believe Safari will be freezing the user agent string soon; Safari Technology Preview is already doing this (it's "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_4) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/11.1 Safari/605.1.15" if you're curious).


> If you're concerned as a user of a malicious site

Or if you're concerned as a user of a regular "safe" site... Google Analytics does all of these things: link tracking, hover tracking, media query tracking. GA or something like it is being used by vast swaths of the web. I don't claim it's the majority, because I don't know, but that's what I assume, that all sites are tracking (whether or not the site even knows it.)


Right, which is why GA is not allowed to load in my browsers. This is a different avenue for disclosure of a similar bundle of info.


Just curious: Are you okay if it's something self-hosted (like Piwik) doing the same kind of tracking?

In other words: Do you not want a third party (like Google) to have that data or the second party (web administrators)?


IMHO I think they are different. So much of what makes the web and surveillance creepy is the ability to correlate across data sources. CCTV isn't creepy for one store owner to have - it's bad when the government has it everywhere.

It's not bad for people to analyze how users interact with their site. It's bad when one entity (or a handful) can track you across the Internet.

So in other words, I don't mind Piwik and have considered sending in a patch to uBlock and others with a switch to disabled "locally hosted analytics" or something similar. Like the drive to push "ethical advertising", I think it's reasonable to permit some benign tracking as a way to coerce more sites into decentralizing user analytics.


I'm fine with with self-hosted analytics, and use a log analyzer myself.

My primary objection is automated profile-generation and identifier sharing - third parties don't need realtime updates on my reading habits. I like to think folks who run their own analytics aren't sharing identifiers with adtech shops, but of course can't know.


GA is present on 46% of web traffic [1] or 65% of top web pages [2], so you could be justified to claim the majority...

[1]: https://whotracks.me/trackers/google_analytics.html [2]: http://randomwalker.info/publications/OpenWPM_1_million_site...


> * Font checking - Can help fingerprinting...browsers need to start restricting this list better IMO (not familiar w/ current tech, but would hope we could get it down to OS-specific at the most)

Oh my. I wish this madness ended. Quoth tedu:

> I don’t know a whole lot about typography and fonts, but there’s two things I know about font files. They’re ridiculously complex and their parsers have only just begun to experience life with hostile inputs. In short, I’d put fonts second on my list of files likely to pwn your browser, after Flash [...].


I've thought for some time that the only reason people have not exploited fonts to take over browsers is because even hackers don't understand how they work.


Note that browsers pass downloadable fonts through a sanitizer before they even consider handing them off to anything else that might need to parse the font. And browser security teams have spent years now fuzzing both those sanitizers and various font libraries...

There's still a lot of attack surface here, but "only just begun to experience life with hostile inputs" isn't quite true either.


> * Media query - So what, user agent gives this away mostly anyways

It doesn't; without media queries you can't detect thing like browser window size or screen pixel density.

> * Font checking - Can help fingerprinting...browsers need to start restricting this list better IMO (not familiar w/ current tech, but would hope we could get it down to OS-specific at the most)

There's a lot of trade-offs here. Plenty of people have fonts installed for various reasons (some because none of the system fonts cover a script they want, some because they're using fonts designed to mitigate some issues caused by dyslexia, etc.), and breaking it for those people would not be good.


> It doesn't; without media queries you can't detect thing like browser window size or screen pixel density.

Sorry if I wasn't clear. I shouldn't have said media queries, I should have said "CSS property queries". What CSS properties you have doesn't leak any more than your UA I would guess.

> There's a lot of trade-offs here [...]

I'll take it as an option to have a strict subset (though would prefer it as opt-out to discourage font-list-based fingerprinting as a practice, though metric-based may never leave). With downloadable fonts, I don't really like the "script they want" excuse. For accessibility reasons, I am admittedly naive, but I would assume it would be a substitute for an existing font name. Unique font names per user seem unnecessary.


>> Sorry if I wasn't clear. I shouldn't have said media queries, I should have said "CSS property queries". What CSS properties you have doesn't leak any more than your UA I would guess.

>> Unique font names per user seem unnecessary.

As a dev that has worked in CSS for over a decade, I have no idea what either of those mean.

EDIT: I see the unique font names part now, totally missed that while focusing on the other parts.


By "CSS property queries" I mean the technique in TFA using @supports + before/after-content URLs to query whether certain CSS properties are supported.


I see my problem now, I don't see @supports as a query like in terms of media queries.


If somebody has fonts installed for specific reasons then presumably they don't want web pages changing them. Web pages shouldn't need any more control than selecting from "serif", "sans-serif", or "monospace". The most legible font is the font you're most familiar with. I don't want web pages to use different fonts just because some marketing drone thought it was good for branding. It's disappointing that we allowed websites to abuse fonts for vector icons.


Have you heard a term 'design'? + sometimes you can't adjust the website to work well with all (or even the popular ones) sans-serif fonts


Give one example where users benefit from the website forcing a special font. And even if that happens, the website can provide the font to the user. None of this requires the user telling the website host which fonts are installed.


In the case of reddit, the custom CSS could not reference off-reddit resources (images were uploaded), so this technique would not work.



> not an easy fix

All of this is an easy fix: disable css. In the same way that "I don't want to be tracked by javascript" can easily be resolved by disabling javascript. I'm not seriously suggesting everyone does that, but anyone who is so paranoid that they don't want a site knowing that they're reading its content might want to consider it.


Happily used to (5 years ago) surf the web with no JS and no CSS, or rather applying my own style-sheet for 90% of my web viewing. I'd fall back to Chrome when absolutely necessary. It was fast and comfortable, it just relies on well structured accessible content.


There's also a nice Open Source project which implements that using a proxy:

https://www.tedunangst.com/flak/post/miniwebproxy


> it just relies on well structured accessible content.

Honest question: how much of this is left? What popular sites are still accessible this way? HN might be the only site I visit frequently where browsing with no js/css has any hope of working.


Try it and see. I block a ton by default[1]; most sites are just fine without it.

I get that some people have low tolerances for things not being perfect. CNN stories without JS usually have a pile of empty images at the top, for instance. But that is probably fixable; I just haven't bothered to figure out which bit of JS to allow for that.

Usability depends on your tolerance for imperfections vs. your tolerance for being observed.

[1] Current setup uses JS Blocker 5, uBlock, an aggressive cookie manager and my home proxy, which does a ton of things, many of which I don't even remember at this point.


Check surfraw (by Assange), you can actually access a surprising amount of resources using sr and lynx from a terminal. Text only, but still makes internet pretty useful.


Are you aware of any piece, writeup review or anything on Assange’s programming skills?


No js is suprisingly fine. Google search works, gmail works, maps don't :)


I browse HN using w3m, from which I'm commenting right now. It works on probably 80% of the links I attempt to visit. In many cases it works better than a graphical browser: I only see article text, and for reasons I haven't investigated I often seem to be ignored by paywalls. I never see subscription nagboxes or ads. Sometimes I have to search forward for the title to skip the load of garbage that precedes the article text.


While we, as devs, may get tired of the constant beat-down between site flexibility and privacy, many of our users are unaware. They will go blindly towards flexibility and we have a duty to find as much compromise as possible between those two values lest we just say "it's an easy fix, just turn off your computer". There has to be a middle ground between extremely paranoid turn everything off and extremely liberal with my anonymity (and on the internet, it's not governments who are going to help find it).


I guess my point is "how much anonymity is it reasonable to expect?" Should I have a problem with the fact that nigh-on every URL in the world will leave behind a little footprint when I request it? I don't see an enormous problem with a website anonymously recording the fact that I've clicked a link.

("anonymously" assuming I'm blocking their cookies, which I would if I were that paranoid)


> ("anonymously" assuming I'm blocking their cookies, which I would if I were that paranoid)

Note that cookies are mostly a convenience vis-a-vis tracking. For user tracking, there's nothing really stopping the server from vending you a version of their site with custom CSS that loads images with a fingerprint in the URL, which would still work with cookies disabled. That'll get you coherent signal on a session (gluing different sessions together would be a bit more challenging, of course, but I wouldn't be surprised if it were possible).


I don’t mind sites tracking to know what products sell and what doesn’t, what browsers people use or how Long I spend on the site etc.

What I hate, is the fact that I go to agoda, I search for hotels in jiufen in taiwan, I look at only 2, I book one of the 2, I close it. Open up Facebook on my phone seconds later and have adverts saying: hey how about these 2 hotels in jiufen.

That shit annoys me. Stop following me and tracking what I’m doing and sharing it with all these companies. It makes me want to not use the internet...


a footprint is not a fingerprint. This whole thread is about non-anonymous tracking.


Completely disabling CSS will make a lot of sites inaccessible. Partly disabling it (for example the #after part) would be interesting...


Wonder how complicated a CSS Parser / White-listing extension would be?


Complicated enough to be a security nightmare


That's pretty much the "reader mode" available in some browsers, no?


Sure. I think it's mainly interesting in that CSS injection vulnerability can turn into tracking. it never occurred to me that a CSS injection vulnerability could do anything actionable before.


> So what, the site could route you through a server side proxy anyways

There's little interest in proxying through a token system (this would require a DB read at each click, and a DB write at each page generation), which means the actual link is available client-side and the whole thing can be bypassed.


It's easy to design a system like this where the actual link isn't available client-side, and the server doesn't need to wait on a DB read and write before responding to the client: make the URL parameter be the destination URL encrypted so that only the server can read it. That kills the need for a DB read. Then the server can respond to the request before the DB write finishes since the integrity/consistency of that write is likely less critical than the response time.


That's correct.


Most implementations that I've seen, including Google's, just put the linked URL in the query params of the redirect endpoint URL.

'/redirect?url=...'


link 0 is inaccurate. Reddit retracted that statement on CSS, saying mod created styles will stick around.


Wouldn't the Tor browser solve most of these issues?


Whose going to be first to make the 'I always browse the Web with CSS disabled' post?


Stallman of course.

> I usually fetch web pages from other sites by sending mail to a program (see https://git.savannah.gnu.org/git/womb/hacks.git) that fetches them, much like wget, and then mails them back to me. Then I look at them using a web browser, unless it is easy to see the text in the HTML page directly. I usually try lynx first, then a graphical browser if the page needs it (using konqueror, which won't fetch from other sites in such a situation).

https://stallman.org/stallman-computing.html


Using lynx, the only thing that makes reading Hacker News somewhat inconvenient is the lack of indentation to show the nesting hierarchy, but otherwise it works quite well.

Some other sites are so messed up that it's actually more comfortable to read them in a text-only browser that completely ignores CSS and replaces images by their alt-tags.

Of course I frequently do want to look at images, so my main browser remains Firefox, but it's still useful to remember that other browsers with different tradeoffs exist and can be used.

Sometimes, you really just want to read some text and don't need any of that fancy other stuff.


You can see the indentation if you use w3m. HN uses tables to structure the comment hierarchy, and the w3m browser does a pretty great job rendering tables.


It'd be much nicer if HN used nested lists (without icon) for comment structuring. That'd also work fine in many more textmode browsers.


w3m sets the column width for the HN nested table spacer all to one value, so you visually only get two levels on nesting. Here is a screenshot of this thread as rendered by w3m:

https://lh3.googleusercontent.com/rZ1yOj55fvQqtWWbOnoSTpvCgx...

w3m table rendering is based on a heuristic algorithm and fails in some cases. See the "Table rendering algorithm in w3m" section in:

http://w3m.sourceforge.net/STORY


True! I fixed that issue in the community maintained HN version here, https://github.com/arclanguage/anarki/commit/0d6cea75d902899...

Would be nice if HN would fix it, too.

I always browse HN using the links2 browser. No CSS! (Although elinks is pretty interesting, in that it's a text-only browser that implements some CSS.)


Firefox's "reader view" is quite good for that (and to avoid bloggers terrible choice of fonts/font color/font size)


If a website doesn't look like it was made in the last couple of years (think: Medium-like centered content with large fonts), I click that handy reader view button out of a habit.

I can't stand reading articles with <18px font size. Some pages (like HN) I simply zoom in to something like 150%, but if it's just an article, hitting that button is easier to me than zooming in.


Links (a fork of lynx IIRC) does images, it might be off by default, can't recall. Back when I used Slackware it was handy to have a terminal based browser for looking up how to fix things.


A more interesting question would be if it is possible to disable the "dynamic" part of the CSS in any browser. Things like ":hover", ":active" that this proof of concept abuses and leave just the more benign static styling rules.


Perhaps preventing the loading of URLs in the dynamic parts would be enough?


Probably not, you'd also need to disable a lot of other optimizations.

For example, a browser will not load an image if it's set to `display: none` in CSS (at least not right away). That could be abused to then trigger the download when CSS changes without a URL needing to be on the CSS at any point.


Well, I have firefox `layout.css.visited_links_enabled` set to false not to leak history.


Even if that's set to true (the default), doesn't Firefox prevent the page from reading the :visited state of links? I'm not sure what the privacy value of that pref is.


Blind users can’t be seen.


Screen readers interface with normal browsers, so JS and CSS will be loaded as per usual (unless the user has gone to the trouble of turning them off).


Or, at least, disabling `url()` in CSS.


Indeed, that would be enough to stop the dynamic tracking - no need to go full no-CSS.


Does any browser actually allow you to do that?


I'd say only for "content" and @font-face "src" for now.


> Whose going to be first to make the 'I always browse the Web with CSS disabled' post?

Is it time for a gopher[0] revival?

0 - https://en.wikipedia.org/wiki/Gopher_(protocol)


Wouldn't something like this be enough?

    ::before, ::after {
        content: '' !important;
    }


You have no idea how many websites you would break:) I've seen UI strings in "before" and "after" pseudoelements...


Don't think it would be a huge loss. Most common use case is for unicode characters to render font icons (i.e. font-awesome).


No, easily defeated. It would start a selector specificity war.


'I always browse the Web with CSS disabled'!!!!111oneoneone /s (is this how it works??)

Doesn't everyone?? ;)


I don't see what's problematic about this. The tracking is not really done in CSS, so much as on the server. You could accomplish the same thing with 1x1 images, or loading any remote resource. Effectively the only difference is you're loading the URL conditionally via CSS, as opposed to within a `<script>` or `<img>` tag. Furthermore, this can be blocked in the same way as any tracking URL.

I concede this is a novel way of fingerprinting the browser from within the client, without using JS. However, I think a better way to describe this would be "initiating tracking on the frontend without the use of javascript."


The difference is that CSS can trigger remote resource loads in response to post-pageload user behavior, which intuitively seems like a JS-only thing. For example, tracking where the mouse has moved, as mentioned in the readme.

I wouldn't say it's some sudden, alarming capability, but it is distinctly more capable than <img> tags.


Think about people using extensions like NoScript to block JS because it offers this functionality. This is fairly relevant to these people, as they clearly also need a “NoCSS” extension.


This stuff is relevant for sites that allow users to upload CSS to be used by other people. If I can make a subreddit or a social media page on a site and upload custom CSS for it, then I can make the CSS trigger requests to my own personal server on certain events and track the people who visit my subreddit/page. (Reddit adds certain restrictions to CSS that can be uploaded to it to defend against this.)


About 8 years ago, a colleague and I interviewed a nervous kid fresh from undergrad. He was applying for a junior front-end position at our fast-growing startup. Dressed in a shiny, double-breasted suit and wingtip shoes, he followed us into a tiny office (space was so limited) where we conducted interviews.

"Tell us about your CSS experience," we asked him.

"Ah, yes. I, well, haha, of course. The CSS is where you make your calls, to the database, ah, server, ah, of course."

Unsurprisingly, we did not hire the applicant, though his answer to our question lived on in infamy for many years. But all that changed, today, reading this. The joke was on us. That kid was clearly from a future of which we had no awareness. Starting today, I'll always trust programmer applicants donning double-breasted suits.


Reminds me of similar techniques that could be used several years ago to sniff browser history via a collection of a:visited rules.


> However using my method, its only possible to track, when a user visits a link the first time

This suggests that browser history sniffing is still possible - as long as you make the user click the link (in contrast to the old a:visited method where this could be done with no user interaction)


Any part of a browser that can make a request can be used to do this sort of thing. Any part of a browser that can alter the view and its related DOM attributes can cause a user to interact with it and give up data involuntarily.

Turn off JavaScript and CSS media queries can cause resources to load based on a number of parameters. Have canvas enabled and you can be fingerprinted. Use one browser over another and get feature detected. Anchor states give away browsing history. Hell even your IP address sacrifices privacy, and that's before the page gets rendered.

So with that being said, if you're browsing the web, you're giving up information.


Very smart! This is a few line of code away from a css class based mini tracking framework...

Aside from the obvious, this could also be used as a fallback (restricted) A/B testing for no js users ? I'm thinking data about just what was hovered, clicked, and media query allows for some basic UI testing of responsive websites.


This doesn’t mention my personally favorite css tracking trick, timing attacks that can be used to detect what sites you have loaded. This can be done by interweaving requests to a remote URL (say background-image) with requests to your server script, which times these differences.


The fanciest tracking trick is the HSTS supercookie.

You use a bunch of subdomains -- a.example.com, b.example.com, etc. -- each configured so that a particular URL (call it the 'set' URL) sends an HSTS header. A different URL (the 'get' URL) doesn't.

You generate an ID for the user, and encode it as a bit pattern using the subdomains to indicate positions of '1' digits. Say your ID is 101001 -- you serve a page which includes images loaded from the 'set' URLs for subdomains a, c, and f. On later page loads, you serve a page including images loaded from the 'get' URLs of every subdomain, and you pay attention to which ones are requested via HTTPS. Since the 'set' URL sent an HSTS header, subdomains a, c, and f get requested over HTTPS, and now you reconstruct the ID from that: 101001.


> The fanciest tracking trick is

I feel like this changes all the time; I was recently surprised to discover 'TLS Client Channel ID' (my nomenclature is a bit fuzzy - an RFC for automatic client certs "for security") and would love to learn more about the extent of its current implementation in Chrome.

https://news.ycombinator.com/item?id=15753648

>londons_explore: In Chrome, it also uses the TLS Client Channel ID, which is a persistent unique identifier established between a browser and a server which (on capable platforms) is derived from a key stored in a hardware security module, making it hard to steal. Ie. if you clone the hard drive of a computer, when you use the clone, Google will know you are a suspicious person, even though you have all the right cookies.

https://en.wikipedia.org/wiki/Transport_Layer_Security_Chann...

http://www.browserauth.net/channel-bound-cookies


But how many users disable JavaScript in their browser to prevent tracking? And is the fact that a website can track all your clicks and mouse movements a privacy/security issue to begin with? Isn’t it by design that the website you’re visiting can track you?


> by design

By design, the web is a "1. send me the document <-> 2. here it is" transaction, not a series of many small notifications. By design, the url() property almost certainly wasn't intended to be dynamic. This is clearly 'bending the established rules' — cleverly, admittedly.


> By design, the web is a "1. send me the document <-> 2. here it is" transaction, not a series of many small notifications.

That was true in 1998, but most of the web has been turning, by design, into what you might call a series of many small notifications ever since then. Gmail's been doing it since 2004. Today, almost all large sites are running Google Analytics or something like it, which track everything this article discusses, and operates on constant micro transactions. All web apps are built on many small notifications, and many of them even use websockets which was explicitly, by design, built for streams of micro transactions.

> By design, the url() property almost certainly wasn't intended to be dynamic. This is clearly 'bending the established rules'

There was never a rule against, even if dynamic usage wasn't expected or imagined (which I find unlikely). CSS allows it, therefore it's allowed by design.


Wishful thinking, but today the web is clearly, obviously not a simple document retrieval system.

It's now an application platform regardless you like it or not.


The `url()` function wasn’t intended for tracking, sure. But my point is that it doesn’t matter, since it is accepted that the website you’re on can track you to begin with. I don’t think anyone in the standards bodies is trying to prevent that.


Call me naive but, as a dev, I don't see why this would be any better than using JS. The group of people that block JS is likely to do the same for this and, as mentioned by others, common sources of such mucking are blocked by a good ad blocker.

Then, there is the whole, "how could it be integrated into an existing site with minimal fuss" issue. With JS you can specify targets and the like for actions and observations, the only comparable thing would be to offer sass / less integration so that it works with clients that disable or block JS, which is arguably much more difficult.

While it is definitely clever, I just don't see a practical use for it. It would really only benefit those willing to put the work into using it and only work so long as their logging URL is available and not blocked. I just don't see the real value.


Does it need to be practical? It's just a proof of concept, as is displayed in the very first sentence in the README file.

This seems to me more like "wow, something cool has been done in an unusual way" material, rather than "this is something you should consider using".


At least in Safari 11.0.2 (macOS 10.12.6) link tracking does not work. The selector

  #link2:active::after
appears to always exist in Safari 11.0.2 ("active" is being disregarded).

I clicked on none of those links, I have never visited google.de, but results.php page told me all 3 links had been clicked.

EDIT: formatting, remove word.


You should let somebody know: http://bugreport.apple.com

The demo site correctly tracks me in Safari 11.0.1 (macOS 10.13.1)


Alright lads, let's all go back to RSS feeds and scrap that whole "browser" experiment.


I seriously think we need an alternative to HTML that axes styling and scripting and concentrates solely on the markup / content description. Websites would use a certain set of elements/descriptors to describe the content they contain. The user’s website reader would parse the markup / content description and display a page how it thinks it should be displayed (according to the user’s preferences). All websites would have the same styling – the one chosen by the user. This HTML alternative could provide an API that makes it possible to have dynamic websites but still prevents scripting and fingerprinting.


Against an Increasingly User-Hostile Web | https://news.ycombinator.com/item?id=15611122 (2017Nov:1307 points, 502 comments)

#oneofus https://hn.algolia.com/?query=13226170&type=comment (click a 'comments' link on the search results, then 'parent')

I've connected similar sentiment here for about a year now; I've appreciated mention of several helpful tools in this thread.


A nice start would be browser support for Content-Type: text/markdown.


Sounds like HTML, and a browser. (Sad.)

Perhaps Firefox's Reader View would be suitable, when activated.


Something like a schema for a restricted subset of XHTML?


I know this is a joke but you will still have tracking in the rss reader or in the images loaded along side the articles. The only solution is paying for software that does not track.


The images can be embedded via data URIs.


ASCII art only.


I'd suggest telex, your downloaded content is automatically backed up on paper.

:)


Interesting trick. But I think adBlockers block requests to entire tracking domains. So even css calls would be blocked?


It’s pretty trivial to make server side calls to google analytics [0] passing lots of different data using async commands so the user doesn’t even feel the hit.

Additionally you could queue these stats messages and send in bulk when your server load falls below a certain threshold. I’m not talking hours, just seconds. Like a workflow engine.

0: https://developers.google.com/analytics/solutions/experiment...


Not if you're hosting your own tracking information, on your own domain


honestly, I don't care if you use your own tracking solution on your own domain, as long as it's not passing the data to a third-party.

I get that some of that data is genuinely useful in determining what parts of an app are popular and what is not. Even though I don't like being tracked for dumb shit like ads, it does have valid uses.


Most trackers are built by third parties. This is true for analytics tracking and for ad tracking. Few companies ought to track their own impressions for many, many reasons.


Why do you think it's mostly advertisers building CDN's? It's not because they really wanted to make the web faster. These are typically let through by ad blockers.


That's a good point, if you're only concerned with tracking from a specific set of domains.


I'm surprised browsers wouldn't prefetch stuff like this given it's an easy performance win. Which would then also make these stats useless.


Prefetching state that's never needed is a bandwidth waste.


For many users (i.e. desktop) bandwidth is something you're throwing away if you're not using it. I'm not saying this is a good idea in all cases, but it might be in some.


Very interesting, it's always intriguing to see how much of a cat and mouse game this privacy stuff is. I'm always thinking that this needs an overhaul and slightly different approach altogether, sadly I can't produce any viable solutions.

With this huge and complex kind of issues I don't think we have to find one solution but rather point in the right direction, but I'm not even sure we're doing that.


In my opinion the new approach should be: Let websites deliver content and let the user’s website reader interpret the markup / content description and style the page according to the user’s preferences. Websites shouldn’t be able to style and script themselves any longer.

The website should load more content when the user scrolled to the bottom? Let the website reader retrieve the content itself. The website wants to know the dimensions of the viewport to load the appropriately sized image or change the layout? Tough luck, this is none of the website’s business! Let the user’s website reader handle this.


This is where the talent, funds, and resources will go as ads and marketing are industries with lots of funding available. Even more tracking and of even more pervasive kind. We hate tracking while we bet our time and money on it. The web is cancelled, go back home everyone.


Is there a way to turn off CSS media queries in Firefox, or fake their conditions? Apart from the security issues, it's plain annoying when the page layout will change completely because a few pixels of window size are missing for the perfect experience.


Well this is depressing.


This could easily be stopped by a change in browser behavior. If web browsers downloaded contacted every address specified with `url()` automatically on page load, without considering the conditions, this type of conditional requests would be impossible.

Conceivably, you could solve it through a simple browser extension that looks through all of the page’s stylesheets and calls all URLs present in the CSS before the page is rendered.

In an ideal implementation, though, URLs dependant on “static”, non-identifiable conditions, such as an image with `display: none`, would be left alone.


That's likely to have unfortunate performance implications, particularly on mobile or low-bandwidth connections.


The obvious solution is to block the server-side pages that the CSS elements link to. This kind of tracking can be mitigated the same way any other kind of tracking is already handled by uBlock or uMatrix.


uBlock can't block manual tracking...just third-party scripts that do it.

Example: You visit example-site.com

example-site.com is the php server that sends you the html. It also the site that does the tracking. So when you click something it sends that data to example-site.com and then it can forward the data to a third-party tracking service.

If you blocked or used host files on the server-side pages then the site example-site.com would be completely blocked too.

Ultimately if everyone uses ad-blocks to block tracking script they can be added to the back end. If you block the back-end you effectively block the website you are accessing in the first place.


If it's all moved to the backend, then we win, because then we can easily control what data is being collected.


Wouldn't that require a separate blacklist for each site?


Probably not everyone would be willing to create their own user tracking solutions, most websites use third party analytics, which can be handled by generalized rules. For those that do roll their own solutions, per-website block lists would be needed, but that's how site-specific adblocking already works. The lists are maintained by the community and updated very frequently.


Oh, that's true. I wasn't really thinking about third party solutions.


uBlock and a hosts file are standard on every computer I own.


I think it is time to split the web into:

* user and machine readable content (text with hyperlinks, pictures, audio, video, rest)

* universal app store (javascript, css, intents, permissions...)

Every user could consume or style content as he wishes. If my IDE has dark theme, I want all web pages to have dark theme. Why do I need javascript to read news or browse pictures.

If user wants to installs app from app store he should accept software license and give permissions to that application.


This is an interesting concept, but I'm not seeing anything that couldn't already be done with a properly set up website and server logging.

Things like "@supports (-webkit-appearance:none)" doesn't give you chrome detection. It gives you webkit detection, which is a rather large subset of the whole. Plus some of the other browsers started supporting webkit prefixes.


> doesn't give you chrome detection

Checking every possible prefix should distinguish most versions of most rendering engines -- still not bad.


It would only, at best, give you outdated browser versions. Unless you are going to create a huge set of rules checking certain properties against other properties. Plus it doesn't tell you which browser, only maybe which webkit engine version. Which tells you next to nothing.


”Interesting is, that this resource is only loaded when it is needed (for example when a link is clicked).”

The resource is retrieved using GET, so I wouldn’t think that is required by the http standard. If so, browsers can mitigate this kind of attack by pre-fetching these resources (even pre-fetching a fraction at random already might be enough)

It is a neat hack, though.


Do browsers really need to allow fetching URLs in the "after" event of a link?


It's not after event, the "after" is for inserting a pseudo-element. That then sits inside the original link element and hits the tracking URL by trying to load a resource from it when the link is active.


Tracking seems to only really be server side. The css just dispatches requests with qs params. Probably not an ideal production tracking solution as it severely limits the data you can send back for better analytics


What's "sheeding"?



How does 'check spelling as you type' work, via a dictionary that is previously downloaded, or is this an online service that leaks all/or some of your key presses?


Nice POC! I love the project name :)


The demo shows that this technique doesn't work for "Privacy Browser" on Android. It can be obtained from F-Droid.


very good. I wonder if somebody really needs this.


Wow, CSS is the new JS :D


A lot of css is fluff masking low information content. Turning off css helps me not have to page down x times to see a noncathartic one-liner.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: