"The security properties of a collision resistant hash function, ensure that a modification results in a very different hash."
I really appreciate the clarity of this post. The author is building up the groundwork without skipping steps that may be obvious to many readers. I of course knew the purpose of a hash before reading the article, but some people don't - and that sentence clearly let those users know why the hash matters without making it less readable for knowledgeable readers.
Honestly, I thought this line wasn't very clear, at least if I understand it correctly. It is not important that modification produces a "very" different hash, even a minimally different hash is still different enough. What is important is that it is computationally infeasible to generate a collision. So if your evil plan is to modify someone's .js file and then play with comments/whitespace until the hashes match, you and the website will both be dead before you find the collision.
no, it be "very different" means you can't do things like generate a hash of all dictionary entries and identify when someone has a password that's only a slight change from a known dictionary entry.
It's absolutely important in the security of hashes.
Avalanche effect is a sign of a good hashing algorithms, but not necessarily all of them.
It is very vital in case of SRI indeed, as because SRI also intended to shield from potential MITM somewhere in CDN stack. But SRI is useful in other areas too. For example for really bare version handling and for handling (somewhat) gracefully corrupted cdn responses (including errors, empty responses etc).
For that, avalanche effect is not all that necessary and CRC32 could do an ok job too.
If you use webpack, just drop in webpack-subresource-integrity [0] for basically "free" SRI tags on all scripts.
It's not really as useful if you are serving your static assets from the same place as the HTML (and you always use HTTPS) but if you load your js/css on another server SRI can still provide some protection.
It used to be, briefly, unfortunately it had to be removed again because create-react-app has a zero-configuration policy and SRI can break pages served without TLS unless Cache-Control: no-transform is set on the server [1].
It'd be cool if the browser used this to allow cross-origin caching as well.
Say I have jQuery previously loaded a page that included jQuery from CDNJS and now I'm in China and another site tries to load jQuery from Google's CDN.
Currently that request would get blocked by the great firewall. But since the browser should know that this file matches one it has seen (and cached) before it should be able to just serve the cached file.
This could also save a network request even if I'm linking to a self-hosted file on my own servers if I include the hash.
The potential problem I see with this is that it could be abused for a "have you loaded this resource" privacy leak. Simply pick a unique script on a website, if my server doesn't get a hit then I know you went there before.
Possible solution is to have a content hash proxy trusted by the user but shared between multiple users. Then the site can only get the data at the proxy-level rather than the user-level, and not even that if the proxy is large enough to justify crawling the web to pre-cache everything, or is behind a larger cache that does.
The browser would know if it had a content hash proxy configured and then could use it for all content hashed data. The issue becomes getting people to use one, but partial uptake is better than nothing. You could at least get most corporate and education environments with some equivalent to WPAD[1], and maybe even some consumer-level ISPs the same way if they want to reduce the traffic over their network.
Those are abusable by first party only though aren't they? This abuse isn't just limited to fingerprinting from 1st party sources, but to 3rd party history detection essentially.
If I visit website A it should not know that I also visited website B. HTTP or HTTPS is irrelevant - they are totally separate connections and this does not assume anyone is in the middle.
Of course, there are attacks that already work using cache timing, but that isn't a good thing.
I know a handful of attack that work, and a few more vectors that won't be fixed as well. So why bother, exactly? It's neither a big concern nor fixable in current web design.
More detail on this: in a browser's current usage model, this is vulnerable to a 'cache origin confusion attack'. See this thread [1]. It's a bit hard to follow, so perhaps see these posts [2][3], which state the problem succinctly. Let me adapt the text from [3]:
The problem is that www.victim.example/evil.js doesn't exist, and never did, but your browser won't know that if it's in the cache -- this gives you a way of faking files existing on other servers at the URL of your choice, and as long as they're in the cache you'll get away with it.
1. you visit evil.example and the browser stores evil.js with the cache key "foo".
2. you visit victim.example which has an XSS vulnerability, but victim.example thinks it is safe because it uses Content Security Policy and does not allow inline scripts or scripts form evil domains.
3. the XSS attack is loading <script src=www.victim.example/evil.js hash=foo>
4. the browser detects that "foo" is a known hash key and loads the evil.js from cache. Thinking that the file is hosted on victim.example - when the file is in fact not even present.
5. the evil.js script executes in the context of victim.example, even though they use a Content Security Policy to prevent XSS from being exploitable.
It seems like the problem is that you need to verify that the file actually exists there with the specified hash, but that isn't the same as having to download it. Is there a way to ask the server for just the hash of a file? You would still have the round trip to the server, but it would be a request for a 32 byte content hash rather than several orders of magnitude more data than that.
One of the replies to my similar post in the earlier thread [5] was from btrask who proposes a similar scheme, using a manifest placed in .well-known/, on the issue tracker for the W3C SRI spec [6]. The conversation is still ongoing, but see that for advantages and drawbacks, as well as other proposed solutions (and/or join the conversation!)
More people should use CSP. It doesn't mean you have to stop worrying about XSS entirely, but it means that when it happens CSP can change it from being "a potentially wormable exploit that could be used to leak many users' data and kill your company" to "slightly annoying way the page formatting can be broken".
Like any feature, there's a lot of weird edge casey parts of SRI that can be abused.
We blogged about Adblock detectors over here[1] a couple of months ago for BugReplay[2]. A major Adblock detector FAdBlock was using subresource integrity to detect whether it's payload was being blocked, ie when adblocker was blocking it from being loaded.
> It'd be cool if the browser used this to allow cross-origin caching as well.
As a caveat, there's info leaking here depending on whether the cache hits/misses, so this would need to be opt-in from the cache source, e.g. You set up subresource integrity and also say "allow other domains to load this resource from the cache."
Call it subresource sharing?
Edit: opt-in would need to be on both "sides" (sharer and sharee).
This still wouldnt be enough, because you could still target the unique mix of script/script versions some specific site has, and have a high chance of being accurate, or at least high enough for statistical purposes.
If the site doesn't want to leak that information, it doesn't participate in cache sharing. Since sharing is opt-in, sites won't unknowingly leak this information.
Edit: whoops. I see what you mean. I missed an edit while modifying an earlier draft and left the opt-in only on one side, the sharer.
Given the accumulation of scripts from many sites it might work out alright, those specific versions could come from anywhere - not like only 1 site around uses a specific version of jquery or something. It might be enough, but would need testing to prove it out either way.
Not just CDN, there's benefits to rolling out SRI for lots of your third parties.
Your stripe js, scary ad networks js, front-end analytics companies. SRI is really neat and helps protect yourself from these many 3rd parties being pwned.
Well, if Stripe wants you to stop accepting payments, you will. They already can take the resource offline, I don't see how this is a new/additional problem.
If they want you to stay up-to-date, they'll provide a piece of PHP/Node that emits the latest URL/SRI tag.
You, as the web developer, want to start using subresource-integrity. Stripe, as a depended-upon 3rd party, has not yet bought in to subresource-integrity hype-train.
Stripe rolls out a fix for a security issue or other bug in their JS. This breaks your subresource-integrity check. The didn't want you to stop accepting payments, they wanted to fix a vuln.
That hampers the usefulness of using subresource-integrity on 3rd-party resources today (which is what yeldarb suggested). Perhaps in the future the 3rd party would provide a script that emits the URL/SRI, but that isn't today.
If you're not updating the PHP/Node.js library, and you're not updating any data you give it, where does it get the information it needs to update the URL/SRI tag?
And if you are doing any of those things when Stripe pushes an update, how is it any different that having to update the URL/SRI tag?
Ok, so how about caching the Stripe script and serving it yourself, then polling for new versions and then updating the cache and the SRI?
You might break payment for $polling_interval if the script is incompatible with Stripe's server, so perhaps you could have retry logic there, to bridge $polling_interval more smoothly.
You could also manually review the new Stripe code this way, by polling only by hand or by not automatically updating the cache and SRI.
Except that when the third party changes that script you'll stop loading it. Unless they adopt strict versioning in their files you'll have to be careful.
I thought those 3rd-party links are just to make it easier to get started. I've been checking Javascript files into my project repository, since it seemed unwise to add N different points of failure for no reason.
And anything important like your financial provider is usually very risk-averse to breaking changes, and should give plenty of notice for an update. And if you're including a hash, you don't care about automatic updates anyways.
I can understand trusting one CDN for performance reasons. But do people really add so many different dependencies on their sites? Should I be doing that instead?
I don't think so. Especially with HTTP 2 fetching resources from the same domain will be very fast.
I guess the main audience of CDNs are huge sites that will see immediate benefits. For them SRI is good. But using CDN from day 1 seems to me like a premature optimization.
SRI solves some problems (CDN compromise) in exchange for different set of problems (resource not loaded, what now?). And old saying comes to mind... "I had a problem and decided to use regular expressions... Now I have two problems".
By combining this with an extended form of SRI that supports hashes signed by a private key, it would be possible to bring the security model of web apps almost up to the level of desktop apps. You might still have to Trust On First Use whatever key/identity was signing a given version of a web app (at least the client-side component of it) but a browser could ask you "Do you want to update to version X.Y of this web app?" before running any JavaScript that you might want to check the release notes or reviews for first.
Ideally this would be combined with something like Binary Transparency, where the new version has to have appeared in a public log for some time, and with no trusted third parties publishing a "Do not trust version X.Y" warning in another public log, acting as a sort of distributed immune system for the web.
That's a bug in ycombinator's linkification code. A quite frustrating one, since <> is _the_ standard way to delimit URLs in plaintext (going back to earlier than section 2.2 of RFC 1738, back in 1994!), so having linkifiers that still fail to respect it is really unfortunate.
Yes, there is. Consider a URL followed by punctuation. How do you tell whether that comma, period, question mark, etc should be included in the URL or not? Had I put in a link to https://www.ietf.org/rfc/rfc1738.txt in my original comment instead of "RFC 1738", there would have been a comma right after the URL, for example.
The possible solutions to this punctuation-following-URL problem are that you delimit the URL, contort your sentence so the URL is followed by a space and some other words instead of punctuation, start adding random whitespace after the URL but before the punctuation to avoid the linkifier eating the punctuation, or stop putting URLs in plaintext. I've seen all of these used; the first solution is by far the best.
Oh, and that's all from a Western perspective. If, on the other hand, you're using a language that does not use space-separated words (e.g. a number of East Asian languages), then delimiting becomes even more important, because you can't just guess that the URL ends at the space character; there are no space characters around.
I can't speak to your experience seeing or not seeing this syntax, but as I said it's been part of the URL RFCs for over two decades, is used in other RFCs where URLs can appear (e.g. the Link header syntax), and is reasonably commonly used by people who both put URLs in their email and want to punctuate it properly. I will grant that proper punctuation is out of fashion in certain demographic groups. As is writing plaintext, I guess.
This might have been mentioned somewhere else but - will browsers remove or make an exception instead of blocking mixed-content[0] when a sub-resource integrity check is present? I mean there really is no reason to be paying the TLS over-head for commonly used libraries.
You might not, but some may. Others still may care about keeping their ISP or the public WiFi they are on from seeing they are downloading some scripts from CDNs. Things like that are great for fingerprinting (only 2 sites happen to use these 6 scripts with these specific versions, so if someone downloads those they are on one of those sites).
Plus it still leaks tons of data in the headers. Request time, cache length, useragent, cookies (maybe, hopefully not), accept-* headers, if modified since leaking the last download time, and possibly a lot more.
This only helps for JavaScript (and soon CSS) resources.
If your HTML goes through a CDN (say, you use the full Cloudflare package), the CDN can of course just remove or modify these integrity attributes, or add new scripts altogether.
Date tags are only relevant when the information they contain is of a time sensitive nature. This article would say exactly the same thing if it was written today.
Someone who reads the title and thinks "Hmm, this new article sounds like that Mozilla security engineer's article from a few years ago" would still appreciate the date tag.
or even better, avoid CDNs? it might even be cheaper when you account for the extra work... and faster when you don't have to load data from 10 servers to load just 1 web page
I wonder if it would be a good idea that if the SRI detected a modified javascript file that a warning should be presented to a web user when this occurs?
SRI shouldn't use static hashes, it should set pub keys of different people and the response must have N/M signatures. This way updates are possible and you know N people confirmed the source as safe.
I really appreciate the clarity of this post. The author is building up the groundwork without skipping steps that may be obvious to many readers. I of course knew the purpose of a hash before reading the article, but some people don't - and that sentence clearly let those users know why the hash matters without making it less readable for knowledgeable readers.
Writing clarity matters.