Do not let your CDN betray you: use subresource integrity (2015)

AdamN · on April 5, 2017

"The security properties of a collision resistant hash function, ensure that a modification results in a very different hash."

I really appreciate the clarity of this post. The author is building up the groundwork without skipping steps that may be obvious to many readers. I of course knew the purpose of a hash before reading the article, but some people don't - and that sentence clearly let those users know why the hash matters without making it less readable for knowledgeable readers.

Writing clarity matters.

default-kramer · on April 5, 2017

Honestly, I thought this line wasn't very clear, at least if I understand it correctly. It is not important that modification produces a "very" different hash, even a minimally different hash is still different enough. What is important is that it is computationally infeasible to generate a collision. So if your evil plan is to modify someone's .js file and then play with comments/whitespace until the hashes match, you and the website will both be dead before you find the collision.

braveo · on April 5, 2017

no, it be "very different" means you can't do things like generate a hash of all dictionary entries and identify when someone has a password that's only a slight change from a known dictionary entry.

It's absolutely important in the security of hashes.

bartl · on April 6, 2017

It's very important in the storage of hashed passwords, but not in checking the integrity of files, which is the use case here.

noway421 · on April 6, 2017

Avalanche effect is a sign of a good hashing algorithms, but not necessarily all of them.

It is very vital in case of SRI indeed, as because SRI also intended to shield from potential MITM somewhere in CDN stack. But SRI is useful in other areas too. For example for really bare version handling and for handling (somewhat) gracefully corrupted cdn responses (including errors, empty responses etc).

For that, avalanche effect is not all that necessary and CRC32 could do an ok job too.

Dylan16807 · on April 6, 2017

For this purpose, a theoretical CDN-hash that gives an incrementing ID number for every file ever stored in a CDN would work just fine.

Klathmon · on April 5, 2017

If you use webpack, just drop in webpack-subresource-integrity [0] for basically "free" SRI tags on all scripts.

It's not really as useful if you are serving your static assets from the same place as the HTML (and you always use HTTPS) but if you load your js/css on another server SRI can still provide some protection.

[0] https://github.com/waysact/webpack-subresource-integrity

CraftThatBlock · on April 5, 2017

And IIRC, it's built-in to creat-react-app.

julian37 · on April 5, 2017

It used to be, briefly, unfortunately it had to be removed again because create-react-app has a zero-configuration policy and SRI can break pages served without TLS unless Cache-Control: no-transform is set on the server [1].

[1] https://github.com/waysact/webpack-subresource-integrity#pro...

yeldarb · on April 5, 2017

It'd be cool if the browser used this to allow cross-origin caching as well.

Say I have jQuery previously loaded a page that included jQuery from CDNJS and now I'm in China and another site tries to load jQuery from Google's CDN.

Currently that request would get blocked by the great firewall. But since the browser should know that this file matches one it has seen (and cached) before it should be able to just serve the cached file.

This could also save a network request even if I'm linking to a self-hosted file on my own servers if I include the hash.

problems · on April 5, 2017

The potential problem I see with this is that it could be abused for a "have you loaded this resource" privacy leak. Simply pick a unique script on a website, if my server doesn't get a hit then I know you went there before.

zrm · on April 5, 2017

Possible solution is to have a content hash proxy trusted by the user but shared between multiple users. Then the site can only get the data at the proxy-level rather than the user-level, and not even that if the proxy is large enough to justify crawling the web to pre-cache everything, or is behind a larger cache that does.

nothrabannosir · on April 5, 2017

That's not a solution but a work-around. The original privacy issue stands. browsers can't cache hashed content by default. :/

zrm · on April 5, 2017

The browser would know if it had a content hash proxy configured and then could use it for all content hashed data. The issue becomes getting people to use one, but partial uptake is better than nothing. You could at least get most corporate and education environments with some equivalent to WPAD[1], and maybe even some consumer-level ISPs the same way if they want to reduce the traffic over their network.

[1] https://en.wikipedia.org/wiki/Web_Proxy_Auto-Discovery_Proto...

3pt14159 · on April 5, 2017

We already have that though. Images and HSTS headers.

problems · on April 5, 2017

Those are abusable by first party only though aren't they? This abuse isn't just limited to fingerprinting from 1st party sources, but to 3rd party history detection essentially.

3pt14159 · on April 6, 2017

Ah, thanks for clarifying. You are 100% right, I misunderstood the use case.

cryptarch · on April 5, 2017

Save the load time and simulate it?

cryptarch · on April 6, 2017

Why do you think this would be a bad idea?

homakov · on April 5, 2017

Who cares someone went to X website? Even https is not protecting from that.

staticassertion · on April 5, 2017

If I visit website A it should not know that I also visited website B. HTTP or HTTPS is irrelevant - they are totally separate connections and this does not assume anyone is in the middle.

Of course, there are attacks that already work using cache timing, but that isn't a good thing.

homakov · on April 5, 2017

I know a handful of attack that work, and a few more vectors that won't be fixed as well. So why bother, exactly? It's neither a big concern nor fixable in current web design.

renesd · on April 5, 2017

Yeah, that would be cool. They could also even potentially save JIT typing, specialization, and even parsing. This is a big deal on mobile.

There is Cache-Control: immutable

https://hacks.mozilla.org/2017/01/using-immutable-caching-to... https://bitsup.blogspot.de/2016/05/cache-control-immutable.h...

throwaway2048 · on April 5, 2017

Like every cross site cacheing scheme this could be abused for tracking purposes.

michaelt · on April 5, 2017

Unfortunately this would break Content Security Policy.

Not that many people use CSP, but that's the excuse I've heard for not allowing cross-origin caching.

niftich · on April 5, 2017

More detail on this: in a browser's current usage model, this is vulnerable to a 'cache origin confusion attack'. See this thread [1]. It's a bit hard to follow, so perhaps see these posts [2][3], which state the problem succinctly. Let me adapt the text from [3]:

The problem is that www.victim.example/evil.js doesn't exist, and never did, but your browser won't know that if it's in the cache -- this gives you a way of faking files existing on other servers at the URL of your choice, and as long as they're in the cache you'll get away with it.

and from [2]:

0. evil.example hosts evil.js, <script src=evil.js integrity=foo>.

1. you visit evil.example and the browser stores evil.js with the cache key "foo".

2. you visit victim.example which has an XSS vulnerability, but victim.example thinks it is safe because it uses Content Security Policy and does not allow inline scripts or scripts form evil domains.

3. the XSS attack is loading <script src=www.victim.example/evil.js hash=foo>

4. the browser detects that "foo" is a known hash key and loads the evil.js from cache. Thinking that the file is hosted on victim.example - when the file is in fact not even present.

5. the evil.js script executes in the context of victim.example, even though they use a Content Security Policy to prevent XSS from being exploitable.

[1] https://news.ycombinator.com/item?id=10310594 [2] https://news.ycombinator.com/item?id=10311555 [3] https://news.ycombinator.com/item?id=10312333

(parts first posted here: https://news.ycombinator.com/item?id=13493407#13495482)

zrm · on April 5, 2017

It seems like the problem is that you need to verify that the file actually exists there with the specified hash, but that isn't the same as having to download it. Is there a way to ask the server for just the hash of a file? You would still have the round trip to the server, but it would be a request for a 32 byte content hash rather than several orders of magnitude more data than that.

niftich · on April 5, 2017

One of the replies to my similar post in the earlier thread [5] was from btrask who proposes a similar scheme, using a manifest placed in .well-known/, on the issue tracker for the W3C SRI spec [6]. The conversation is still ongoing, but see that for advantages and drawbacks, as well as other proposed solutions (and/or join the conversation!)

[5] https://news.ycombinator.com/item?id=13495482 [6] https://github.com/w3c/webappsec-subresource-integrity/issue...

Chris2048 · on April 7, 2017

You mean a proof-of-existence challenge?

You create a salt, send it to server.

The server adds the salt to the file, hashes it, then sends the hash back. You check this against the cached data.

When the hash doesn't match, you know either the cached version, or the server version, is wrong.

emj · on April 6, 2017

The HTTP header If-None-Match is mentioned in your links, I don't see the problem with that, except ETags has their problems.

https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/If...

AgentME · on April 5, 2017

More people should use CSP. It doesn't mean you have to stop worrying about XSS entirely, but it means that when it happens CSP can change it from being "a potentially wormable exploit that could be used to leak many users' data and kill your company" to "slightly annoying way the page formatting can be broken".

bugmen0t · on April 6, 2017

Yeah, it was discussed in the W3C WebAppSec group, but there are many unsolved challenges.

This article sums it up really, really well: https://hillbrad.github.io/sri-addressable-caching/sri-addre...

edibleEnergy · on April 6, 2017

Like any feature, there's a lot of weird edge casey parts of SRI that can be abused.

We blogged about Adblock detectors over here[1] a couple of months ago for BugReplay[2]. A major Adblock detector FAdBlock was using subresource integrity to detect whether it's payload was being blocked, ie when adblocker was blocking it from being loaded.

[1]: https://blog.bugreplay.com/2016/11/fkadblock-how-publishers-... [2]: https://www.bugreplay.com

tetrep · on April 5, 2017

> It'd be cool if the browser used this to allow cross-origin caching as well.

As a caveat, there's info leaking here depending on whether the cache hits/misses, so this would need to be opt-in from the cache source, e.g. You set up subresource integrity and also say "allow other domains to load this resource from the cache."

Call it subresource sharing?

Edit: opt-in would need to be on both "sides" (sharer and sharee).

throwaway2048 · on April 5, 2017

This still wouldnt be enough, because you could still target the unique mix of script/script versions some specific site has, and have a high chance of being accurate, or at least high enough for statistical purposes.

tetrep · on April 5, 2017

Why is it insufficient?

If the site doesn't want to leak that information, it doesn't participate in cache sharing. Since sharing is opt-in, sites won't unknowingly leak this information.

Edit: whoops. I see what you mean. I missed an edit while modifying an earlier draft and left the opt-in only on one side, the sharer.

problems · on April 5, 2017

Given the accumulation of scripts from many sites it might work out alright, those specific versions could come from anywhere - not like only 1 site around uses a specific version of jquery or something. It might be enough, but would need testing to prove it out either way.

paulirish · on April 5, 2017

https://hillbrad.github.io/sri-addressable-caching/sri-addre... captures that general idea; but I don't know if it has been pursued farther.

groby_b · on April 5, 2017

It's being discussed, idly. No serious work AFAIK. There's a twitter thread or two...

https://twitter.com/cramforce/status/849621456111624192

https://twitter.com/hillbrad/status/848627295208300544

ejcx · on April 5, 2017

Not just CDN, there's benefits to rolling out SRI for lots of your third parties.

Your stripe js, scary ad networks js, front-end analytics companies. SRI is really neat and helps protect yourself from these many 3rd parties being pwned.

hdhzy · on April 5, 2017

SRI may be a double edged sword. What if stripe fixes a bug and rolls a new version of their JS? Your page stops accepting payments. That's bad.

If jQuery is compromised you'll detect it and download from different location but for stripe there is no fallback.

cryptarch · on April 5, 2017

Well, if Stripe wants you to stop accepting payments, you will. They already can take the resource offline, I don't see how this is a new/additional problem.

If they want you to stay up-to-date, they'll provide a piece of PHP/Node that emits the latest URL/SRI tag.

LukeShu · on April 5, 2017

You, as the web developer, want to start using subresource-integrity. Stripe, as a depended-upon 3rd party, has not yet bought in to subresource-integrity hype-train.

Stripe rolls out a fix for a security issue or other bug in their JS. This breaks your subresource-integrity check. The didn't want you to stop accepting payments, they wanted to fix a vuln.

That hampers the usefulness of using subresource-integrity on 3rd-party resources today (which is what yeldarb suggested). Perhaps in the future the 3rd party would provide a script that emits the URL/SRI, but that isn't today.

jffry · on April 5, 2017

  the 3rd party would provide a script that emits the URL/SRI

And we're back to square one - we can't trust _that_ script to not get pwned

cryptarch · on April 5, 2017

Not really, isn't packaging and securely distributing PHP/Node.js libraries a solved problem?

johncolanduoni · on April 6, 2017

If you're not updating the PHP/Node.js library, and you're not updating any data you give it, where does it get the information it needs to update the URL/SRI tag?

And if you are doing any of those things when Stripe pushes an update, how is it any different that having to update the URL/SRI tag?

cryptarch · on April 5, 2017

Ok, so how about caching the Stripe script and serving it yourself, then polling for new versions and then updating the cache and the SRI?

You might break payment for $polling_interval if the script is incompatible with Stripe's server, so perhaps you could have retry logic there, to bridge $polling_interval more smoothly.

You could also manually review the new Stripe code this way, by polling only by hand or by not automatically updating the cache and SRI.

mmarx · on April 5, 2017

> If they want you to stay up-to-date, they'll provide a piece of PHP/Node that emits the latest URL/SRI tag.

Then how do you verify the integrity the integrity for the tag?

untog · on April 5, 2017

Except that when the third party changes that script you'll stop loading it. Unless they adopt strict versioning in their files you'll have to be careful.

bpicolo · on April 6, 2017

And they won't, because (reasonable so) being update-able is half the point of the way they're doing it.

MichaelBurge · on April 5, 2017

I thought those 3rd-party links are just to make it easier to get started. I've been checking Javascript files into my project repository, since it seemed unwise to add N different points of failure for no reason.

And anything important like your financial provider is usually very risk-averse to breaking changes, and should give plenty of notice for an update. And if you're including a hash, you don't care about automatic updates anyways.

I can understand trusting one CDN for performance reasons. But do people really add so many different dependencies on their sites? Should I be doing that instead?

hdhzy · on April 5, 2017

I don't think so. Especially with HTTP 2 fetching resources from the same domain will be very fast.

I guess the main audience of CDNs are huge sites that will see immediate benefits. For them SRI is good. But using CDN from day 1 seems to me like a premature optimization.

SRI solves some problems (CDN compromise) in exchange for different set of problems (resource not loaded, what now?). And old saying comes to mind... "I had a problem and decided to use regular expressions... Now I have two problems".

theandrewbailey · on April 5, 2017

See also Content-Security-Policy require-sri-for

https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Co...

dane-pgp · on April 6, 2017

By combining this with an extended form of SRI that supports hashes signed by a private key, it would be possible to bring the security model of web apps almost up to the level of desktop apps. You might still have to Trust On First Use whatever key/identity was signing a given version of a web app (at least the client-side component of it) but a browser could ask you "Do you want to update to version X.Y of this web app?" before running any JavaScript that you might want to check the release notes or reviews for first.

Ideally this would be combined with something like Binary Transparency, where the new version has to have appeared in a public log for some time, and with no trusted third parties publishing a "Do not trust version X.Y" warning in another public log, acting as a sort of distributed immune system for the web.

recursive · on April 5, 2017

"An important side note is that for Subresource Integrity to work, the CDN must support Cross-Origin Resource Sharing (CORS)."

This doesn't make sense to me. Why shouldn't I be able to perform integrity checking on resources from non-CORS domains?

epriest · on April 5, 2017

Attackers could otherwise verify that resources on private networks or retrievable only with the victim's cookies have certain content.

See, e.g., <https://github.com/w3c/webappsec/issues/418> for some broader discussion.

LethargicStud · on April 5, 2017

Thanks for the link, but the closing '>' should not be part of it.

bzbarsky · on April 5, 2017

That's a bug in ycombinator's linkification code. A quite frustrating one, since <> is _the_ standard way to delimit URLs in plaintext (going back to earlier than section 2.2 of RFC 1738, back in 1994!), so having linkifiers that still fail to respect it is really unfortunate.

wilg · on April 5, 2017

Is there a reason to delimit a URL in plaintext? I can't recall ever seeing this syntax before and I don't really see a need for it.

bzbarsky · on April 6, 2017

Yes, there is. Consider a URL followed by punctuation. How do you tell whether that comma, period, question mark, etc should be included in the URL or not? Had I put in a link to https://www.ietf.org/rfc/rfc1738.txt in my original comment instead of "RFC 1738", there would have been a comma right after the URL, for example.

The possible solutions to this punctuation-following-URL problem are that you delimit the URL, contort your sentence so the URL is followed by a space and some other words instead of punctuation, start adding random whitespace after the URL but before the punctuation to avoid the linkifier eating the punctuation, or stop putting URLs in plaintext. I've seen all of these used; the first solution is by far the best.

Oh, and that's all from a Western perspective. If, on the other hand, you're using a language that does not use space-separated words (e.g. a number of East Asian languages), then delimiting becomes even more important, because you can't just guess that the URL ends at the space character; there are no space characters around.

I can't speak to your experience seeing or not seeing this syntax, but as I said it's been part of the URL RFCs for over two decades, is used in other RFCs where URLs can appear (e.g. the Link header syntax), and is reasonably commonly used by people who both put URLs in their email and want to punctuate it properly. I will grant that proper punctuation is out of fashion in certain demographic groups. As is writing plaintext, I guess.

yjftsjthsd-h · on April 5, 2017

Information leakage? Just guessing

forgotpwtomain · on April 6, 2017

This might have been mentioned somewhere else but - will browsers remove or make an exception instead of blocking mixed-content[0] when a sub-resource integrity check is present? I mean there really is no reason to be paying the TLS over-head for commonly used libraries.

[0] https://developer.mozilla.org/en-US/docs/Web/Security/Mixed_...

Narkov · on April 6, 2017

What about confidentiality? The integrity portion of TLS may become redundant but the confidentiality portion could still be required.

If you are just loading jQuery you might not care.

Klathmon · on April 6, 2017

Integrity is only 1/3 of the major benefits of TLS.

The other 2 (privacy and authentication) are very important as well and for many are the main reason TLS is wanted.

forgotpwtomain · on April 6, 2017

Do you care about privacy and authentication when downloading react.js from a CDN?

Klathmon · on April 6, 2017

You might not, but some may. Others still may care about keeping their ISP or the public WiFi they are on from seeing they are downloading some scripts from CDNs. Things like that are great for fingerprinting (only 2 sites happen to use these 6 scripts with these specific versions, so if someone downloads those they are on one of those sites).

Plus it still leaks tons of data in the headers. Request time, cache length, useragent, cookies (maybe, hopefully not), accept-* headers, if modified since leaking the last download time, and possibly a lot more.

depr · on April 5, 2017

And get all your resources requested twice on Chrome: https://bugs.chromium.org/p/chromium/issues/detail?id=677022

labster · on April 6, 2017

Well, that sounds twice as secure!

arghwhat · on April 5, 2017

This only helps for JavaScript (and soon CSS) resources.

If your HTML goes through a CDN (say, you use the full Cloudflare package), the CDN can of course just remove or modify these integrity attributes, or add new scripts altogether.

nighthawk454 · on April 5, 2017

Looks like it has good support in Firefox and Chrome, but none in IE/Safari.

http://caniuse.com/#feat=subresource-integrity

vog · on April 5, 2017

The title should have a "(2015)" suffix.

throwaway2048 · on April 5, 2017

Date tags are only relevant when the information they contain is of a time sensitive nature. This article would say exactly the same thing if it was written today.

elgenie · on April 5, 2017

Someone who reads the title and thinks "Hmm, this new article sounds like that Mozilla security engineer's article from a few years ago" would still appreciate the date tag.

sctb · on April 5, 2017

Thanks! We've updated the title.

gszathmari · on April 6, 2017

This tool lets you quickly assess whether third-party assets are protected by SRI: https://sritest.io/

Disclaimer: I am the developer of sritest

awqrre · on April 6, 2017

or even better, avoid CDNs? it might even be cheaper when you account for the extra work... and faster when you don't have to load data from 10 servers to load just 1 web page

tofflos · on April 5, 2017

Previous discussion: https://news.ycombinator.com/item?id=10310594

zitterbewegung · on April 5, 2017

I wonder if it would be a good idea that if the SRI detected a modified javascript file that a warning should be presented to a web user when this occurs?

icebraining · on April 5, 2017

What would the user do with that information?

nwmcsween · on April 5, 2017

It would be infinitely better if I could use a small hash instead of the giant sha variants, imagine 40 or so resources x sha-x length.

dlss · on April 5, 2017

There's no point in using a hashing algorithm that can be maliciously collided.

homakov · on April 5, 2017

SRI shouldn't use static hashes, it should set pub keys of different people and the response must have N/M signatures. This way updates are possible and you know N people confirmed the source as safe.

sedatk · on April 6, 2017

fyi, Edge and Safari has yet to support this feature.