Information Leaks via Safari’s Intelligent Tracking Prevention

om2 · on Jan 23, 2020

We've addressed the issues disclosed to us, and if you try any of the 5 POCs in the paper you will find they no longer work in the latest Safari. Details of the fixes here: https://webkit.org/blog/9661/preventing-tracking-prevention-...

There may be room for more improvement here but be aware what the POCs illustrate is not an active vulnerability any more.

In addition, we don't believe this channel was ever exploited in the wild.

(If anyone is aware of other issues in this area, I encourage you to practice responsible disclosure and report to Apple or to the WebKit project.)

taf2 · on Jan 23, 2020

If only we had the same level of transparency in native apps. At least in Chrome, Firefox, Safari - you have dedicated large groups of engineers thinking about privacy, security and stability. Compared to native iOS/android where you get isolated groups of developers able to exploit and track with little to no visibility from end users - let alone access to source code for review by third party software engineers like we have on the web.... it's madding that apple doesn't invest more in its browser and yet pretends to care so much about privacy - of the major three browsers - apple definitely _seems_ to underinvest when it comes to web technology...

Terretta · on Jan 23, 2020

> It’s madding that apple doesn't invest more in its browser and yet pretends to care so much about privacy - of the major three browsers - apple definitely _seems_ to underinvest when it comes to web technology...

Well, WebKit came from Apple’s work on KHTML. So Safari, Chrome, Edge ...

”KHTML and KJS were adopted by Apple in 2002 for use in the Safari web browser. Apple publishes the source code for their fork of the KHTML engine, called WebKit. In 2013, Google began development on a fork of WebKit, called Blink.”

— https://en.wikipedia.org/wiki/KHTML

That’s not that long ago in browser families, and 2002 - 2013 is 11 years of investment in web tech that now everyone else built on. And they didn’t stop investing.

Some of those investments:

- It’s mostly been the least battery hungry modern browser (by a long shot) on the most wished for dev laptop, and in many cases, the highest performance.

- The bookmark and tab sync across devices is seamlessly slick. I regularly end up maxed on tabs (it’s in the 100s of tabs open at once) and can access any / all of them across all devices sharing iCloud account. Also appreciate that across all kinds of devices, you can save all open tabs to a folder of tabs, then close all tabs, and immediate get at those 300 tabs in the bookmarks from another machine. All those bookmarks are searchable too. None of this slows it down.

- Built in reader mode works beautifully. Reading List is there too.

- Saving a web page to file can save clean reader views into full length PDFs. They’re amazing!

- Interacts with keychain, essentially has LastPass “built in” if you let it store passwords on your keychain.

- While I miss UBlock Origin, ad blockers like 1BlockerX work great across iPad, iPhone, and MacOS. (See also AdGuard for Safari.)

- ITP performs better than one would expect for something you don’t think about at all, while not breaking most banks, which I appreciate.

- Safari never kills my iPad, iPhone, or Mac. Once in a blue moon a terrible site makes me ‘eject’ Safari from running apps on iOS. Launch it again, and all your tabs etc. are fine.

There’s a lot to like, except it’s not super tweakable, or basically “it’s not Chrome”. Even there, most devs or tech geeks who grump at Safari and reach for Chrome, have no idea of the lineage.

Doesn’t seem fair to call it under-invested in.

pampedant · on Jan 23, 2020

Hey hey, "Google Chrome" (+ MS Edge) is not everyone else. Firefox exists and is based on Gecko, which is based on Netscape, they've had a separate and unbroken lineage for 23 years now.

Apple's OSS work tends feel well-made (I use CUPS on Devuan, they own that), but they are not and will never be "The One True Web Tech Makers".

Still, they're the biggest ones on the market as of today, if you count their offspring, Blink.

arkadiyt · on Jan 22, 2020

Reposting from the other [1] thread:

Basically Safari keeps track of which domains are being requested in a 3rd party context (i.e. I load example.com in my browser and the page loads the facebook sdk - Safari increments a counter for facebook by 1). Once a given domain reaches 3 hits, Safari will strip cookies and some other data in 3rd party requests to that domain.

The problem is that advertisers can use this to fingerprint users: register arbitrary domains, make 3rd party requests to them, and detect whether or not that request is having data stripped. Each domain is an additional "bit" of data.

This is similar to "HSTS Cookies" [2] and also to issues with Chrome's XSS auditor, which is why it was removed [3].

[1]: https://news.ycombinator.com/item?id=22120136

[2]: https://nakedsecurity.sophos.com/2015/02/02/anatomy-of-a-bro....

[3]: https://twitter.com/justinschuh/status/1220021377064849410

dang · on Jan 22, 2020

Please don't copy/paste comments on HN. It lowers the signal/noise ratio and makes for pain when we go to merge duplicate threads. If you want to refer to something you posted elsewhere, please use a link.

Better still, when you see a split discussion, email hn@ycombinator.com so we can merge them. We'll make sure your comment ends up in the winning thread.

https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...

hurricanetc · on Jan 24, 2020

This policy is odd.

The link could die, the text at the link could change, or the comment at the link could be deleted. Not to mention a comment section full of links is ugly and unreadable.

StackOverflow has the exact opposite approach because they don’t want their site riddled with dead links.

How many people will care enough to send an email to have a comment merger? That really isn’t a solution to whatever “problem” this is.

neonate · on Jan 22, 2020

The other article was

https://www.ft.com/content/916a766a-3d27-11ea-a01a-bae547046...

http://archive.md/lhUeF

mcphage · on Jan 22, 2020

Based on that, won't the presence of facebook on the ITP list mean either you go to Facebook, or that you've been to multiple sites that have checked if you go to Facebook? ie, won't these techniques soon end up with all false positives?

BoorishBears · on Jan 22, 2020

If you make random domains that only your site references, and they're on someone's list, then you know it's your site

mcphage · on Jan 23, 2020

You don't need this to determine that someone goes to your site—they're there. This is for tracking people as they go to other sites, and for determining what other sites the people on your site have gone to.

Wowfunhappy · on Jan 23, 2020

Why the counter in the first place? I'd rather they block cookies from any domain I'm not currently viewing.

_bxg1 · on Jan 22, 2020

So unlike yesterday's Apple news this is a subtle flaw, not a decision they made

xenospn · on Jan 22, 2020

Seems like it.

lmkg · on Jan 22, 2020

There is a fundamental difficulty when trying to implement privacy: A limit on the disclosure of information is itself a disclosure of information.

A good privacy design needs to confront this issue directly. Sometimes there's nothing to be done. I think in some cases it's mathematically unsolvable (cf. Cynthia Dwork's paper on Differential Privacy). But an explicit consideration can at least surface some trade-offs. The more fine-grained and selective your redactions, the more information they reveal.

qmmmur · on Jan 22, 2020

It makes me think of password requirements. Isn't it bad to earmark a password as requiring certain things rather than to let the possibilities be completely open?

function_seven · on Jan 22, 2020

From a strictly mathematical view, yeah. Requiring that certain character classes exist in the password will shrink the search space for that password.

But: (1) only a tiny teeny little bit, and (2) the gains in password complexity are probably worth a lot more.

Imagine you have a site that allows numeric PINs between 4 and 6 digits. And another site that requires exactly 6 digits.

Technically, the search space is larger for the first site. An attacker would have 1,110,000 possible codes to check, whereas the 2nd site has 1,000,000. But ensuring that all users are in the 1,000,000 search space is worth it to prevent some users' 4-digit PINs being cracked.

earenndil · on Jan 23, 2020

Consider two websites: website A and website B. Website A places no limitations on passwords except that they all have to be from the base64 character set, and be 1-30 characters in length (inclusive). Website B says all passwords must be at least 8-30 characters long and contain one number and one special character.

Technically, there are 1556820866911379157697368408533647424628560378091278400 possibilities for a given password from the first site, and only 1553740989173808677121103544993503115087947728215015424 for the second site. That's only 0.2% fewer total passwords. However, consider that the typical user's password is probably 6-8 characters and contains only lowercase letters; that means that most users from the first website have only 217167790528 possibilities, while users from the second website--even assuming they only go the bare minimum of 6 lowercase characters + one special character + one number--have 345985669120 password possibilities, which is about 60% more. And that's with the artificial base64 limitation; if you open it up to the full complement of 30 special characters it's significantly more.

zie · on Jan 22, 2020

The US organization NIST finally got the memo, and now recommends no password requirements, and no length limits, and encourages long, random passwords. My passwords are generally 64 chars of random, when I can get away with it.

strbean · on Jan 22, 2020

Some day we'll start seeing password requirements like:

> Your password must be exactly 6 characters

> The first character must be one of: a, b, c, d

> The second character must be one of: !@#$

> ...

rasz · on Jan 23, 2020

Last time Google researchers made similar discoveries, 2012, it was used to ... track users :-)

https://www.ghacks.net/2012/02/21/microsoft-google-is-also-b...

"We used known Safari functionality to provide features that signed-in Google users had enabled. It’s important to stress that these advertising cookies do not collect personal information."

and bypassing IE third party cookie protection: "impractical to comply with Microsoft’s request while providing modern web functionality." Google says complying with tracking protection is Impractical!

_underfl0w_ · on Jan 22, 2020

Haven't read TFA yet, but at first glance this sounds similar to the approach used by the "Privacy Badger" browser extension - if it sees the same tracker on multiple sites, it "learns" and begins blocking it. Would it also be susceptible to similar information leaks with this threat model?

erichocean · on Jan 22, 2020

noizejoy · on Jan 22, 2020

I’ve been following privacy issues and technology for a while, but haven’t come across a foundational discussion of (a) the merits of and (b) technical implementations of different approaches to avoid fingerprinting:

“hiding” vs “blending in”(making me look identical to countless others - maybe even randomizing who I look like in a smart way).

I wonder if any subject area experts reading this thread would be willing to share a summary of their knowledge and thoughts here.

MrScurt · on Jan 22, 2020

There are countless add-ons to stop fingerprinting, ad-tracking, disable WebRTC, and force things such as HTTPS. As you've touched on above, these add-ons can be used as unique identifiers to attach your activity to your 'profile'.

Just a thought: I think the route Mozilla is taking is where the industry is heading. More open-source/transparency means more privacy protections for the user. If we get to the point where every browser has built in security features, fingerprinting becomes more of a challenge.

Websites themselves could provide the functionality of a data broker. I am perfectly okay if I get suggested products by a company that already has my data.

In my honest opinion, the current landscape is more than hostile towards the average user and needs immediate course correction.

mirimir · on Jan 23, 2020

As Mirimir, I don't worry at all about fingerprinting. Because that persona is totally focused on privacy and anonymity stuff. Perhaps unusually so, but so it goes.

But then, here's the thing. My other personas are similarly focused, but on other stuff. And they don't use English.

A determined global adversary could link them through traffic analysis. But it's a big Internet, and I don't make it easy.

nattaylor · on Jan 22, 2020

Conversely, Chrome is heading in the right direction:

>Chrome plans to more aggressively restrict fingerprinting across the web. One way in which we’ll be doing this is reducing the ways in which browsers can be passively fingerprinted, so that we can detect and intervene against active fingerprinting efforts as they happen. [0]

This will include things like restricting the volume of Browser API checks allowed, etc, to reduce the number of bits that can be used in a fingerprint.

[0] https://blog.chromium.org/2019/05/improving-privacy-and-secu...

robertoandred · on Jan 22, 2020

Chrome is just trying to start catching up to where Safari and Firefox are.

wiredfool · on Jan 22, 2020

Chrome is restricting fingerprinting, but they still ship google analytics in the browser itself so it's harder to block.

They'll only really block fingerprinting in their browser when they have no use for it.

sroussey · on Jan 22, 2020

Source for the claim of google analytics in the browser itself?

Zhenya · on Jan 22, 2020

I do not see connections on my network to google when I open and browse to 3rd party sites.

Can you show me that's true? If it is, that's fairly interesting.

wiredfool · on Jan 23, 2020

Look in the inspector of a page using GA, and you’ll see it’s served from within the browser, rather than as a download from the network.

erichocean · on Jan 22, 2020

First I'm hearing of this. Citation?

summerlight · on Jan 22, 2020

Wow. I understand ITP's high level design, but didn't know it's implementation is so naive. Maintaining global database with a few rules which can be easily reverse engineered and giving its access to any documents? How did it go through the internal review process? Does Apple have any privacy/security review process for its major products?

I understand that privacy engineering is very hard and sometime can get not very obvious with implicit statistical dependency chains, but this kind of direct problem could (or should?) be caught in an early stage of design. Anyway, ITP is all about privacy and deserves attentions from dedicated privacy engineers.

anoncareer0212 · on Jan 22, 2020

Things started getting explicitly dangerous a couple years ago, internally we always did no wrong, externally, everyone was praising us for being the one company focused on privacy...when pretty much everyone who cared to think about knew why we didn't encrypt iCloud backups, and knew we were collecting app store searches, News articles viewed, and location for ad targeting (this is easily found in public documentation). I left shortly after I realized how little my colleagues knew, cared, and were willing to think about it – a manager on Safari refused to believe that data was being collected, refused to read our documentation on it, and told those concerned that we needed to read up on differential privacy. (note: that didn't apply at all in the conversation, they werr reaching for buzzwords they remembered)