Hacker News new | past | comments | ask | show | jobs | submit login
Yes, browsers do cache SSL resources, so please use Google's CDN (hoisie.com)
90 points by marketer on Jan 19, 2011 | hide | past | favorite | 35 comments



The downside is that the secure versions take longer to download if you don't get a local cache hit. I did a speed test and loaded jQuery 20 times from Google using my local machine with an unprimed cache. For the http version, the average load time was 90ms. For the https version, the average load time was 192ms. That’s over twice as slow for the SSL version. Since you can't guarantee the cache hit, it's a trade-off that doesn't have an obvious answer.


So I have an idea. For HTTPS responses that are cached to disk, 1. do we care about the security of the files on disk? 2. if so, do we do anything to actually secure this content on disk?

If 2 is 'No', could we perhaps encrypt the content with the SSL cert (or some piece[s] of it) before written to disk? Say you have an app you want/need to benefit from caching but the cached data contains sensitive information you'd like to make it harder to get at from a long-term perspective. Take some unique bits (or lots of bits; they're kind of small) and encrypt the cached data with it, and make a hash so you can reference where this cached item came from without explicitly noting where it came from. This would let you store the data with the knowledge that if somebody recovered a hard drive with this old data it would be difficult for them to figure out where the cached data came from and how to encrypt it.

I realize i'm basically asking to use public information to encrypt private data. This can't be too difficult to 'hack' around but you'd need to know what website and certificate created the cached copy (so those entries should probably leave the 'history' as soon as the browser exits). I'm not too familiar with SSL certs in general so to add extra protection you can add something like Firefox's Master Password to the encryption scheme so it's "genuinely" encrypted without data which can be found or guessed like in the SSL cert.

Sorry if this is off-topic, just popped in my head and now i'm curious.


Web developers should declare sensitive data no-cacheable.


Also, this may be a completely separate issue, but shouldn't persistent cookies set over https also be encrypted on disk in some way? If a bank website was found to be setting persistent cookies over https, i'd sure want the browser to be encrypting that cookie in some way before putting it on my hard disk. Again, I don't know if browsers already support this, but I think they should.


Yes, but how many web developers are in a position to decide how their documents are cached, and of those, how many care to? If it's in the web browser I can determine how sensitive data is handled so I don't have to rely on all web developers to do the right thing.


The article argued against says browsers don't cache SSL resources to disk. This is not the same as saying they don't cache them at all. Can someone test and report back the actual behavior?

EDIT: I see from an HN comment by the author that he did mean on disk.


Firefox 3.x caches (to disk) HTTPS responses with Cache-Control: public. Firefox 4 caches HTTPS responses basically the same way it caches non-TLS HTTP responses. Apparently, IE and Chrome are also doing this more aggressive caching. (I work in Mozilla's Platform/Networking team.)


> Firefox 4 caches HTTPS responses basically the same way it caches non-TLS HTTP responses.

Can you please explain this sentence? I hope that doesn't mean "we now treat https just like http." Thank you.


Why would that matter? HTTPS is about getting it over the wire securely... once it gets to the machine, there are no security guarantees.


I don't want the browser that stores to the disk the page from my bank account (that was anyway generated just to be displayed at that moment and never reused) just because "HTTPS is about getting it over the wire securely." I prefer the browser defaults that are sensible.

Writing every https response to the disk is just wrong.


But your bank account page would never reasonably have caching settings on it. We're talking about static assets specifically set to be cached - images, js, css.


> your bank account page would never reasonably have caching settings on it

Are you aware how many banks don't have "ideal" developers? Is it easier to "change the whole world" or to use a browser with sane defaults?


The first thing that comes to mind is that it opens up a timing attack (with JavaScript; it might be possible, though more difficult, with server logging). An attacker could find out which (secure) pages you've accessed. Given that there are link coloring tricks to do this anyway, it might not be so bad, but I was under the impression that browsers are starting to close that hole.

Of course, you could always cache-control this behavior away on the server, but not if you don't know about it. I, for one, am glad I found out.


If you don't want your HTTP requests to be cached and/or stored, then you really must use the appropriate Cache-Control directives.

IIRC, most browsers have cached HTTPS resources in memory (if not on disk) for a long time, so these kinds of side channel attacks have always been possible.

More generally, TLS is not good at handling side-channel (timing, caching, size measurement) attacks. If you want to mitigate side channel attacks then you need to do a lot of tedious work (that is practically impossible with current mainstream tools) at the application/HTTP layer on the server.


I upvoted you: everything in your comment was correct, on-point, and good advice. I'm more worried about all the people who don't follow the best practice. I know a guy who used to run web servers for dozens of clients, who didn't know about HTTP headers before I told him.

For what it's worth, in-memory caching is a totally different animal. You can expect the in-memory cache to keep a typical object for minutes or hours, depending on usage patterns. You can expect the disk cache to keep a typical object for days or weeks, across browser restarts and even system reboots.


If this caching allows a website to switch from using HTTP to HTTPS within its budget, then I think the net effect is very positive. We can't have bad website administrators/developers holding back real security improvements with their incompetence. Really, caching is a very small security impact compared to other problems that such an administrator is likely to cause.


Again, absolutely right. Remember, though, that we're not talking about the capability here: we're talking about the default. A (well-run) website can get all the caching benefits by including a HTTP header. The article is about a well-run website that does exactly this. The default only matters at all for poorly-run websites.

Given that poorly run websites are considerably less likely to be worried about scaling issues, the caching is mostly inconsequential. So, would we prefer to give the poorly-run website a mostly inconsequential security benefit or a mostly inconsequential scaling benefit?


We probably want to remove any excuse for not switching to https. Perceived performance penalties, inconsequential or not, might hold back many sites.


Question: The article says that if a resource is accessed as 'http:' and then as 'https:', then the second access will not hit the cache. Is that true? Thanks.


They are different resources so they are cached separately. There is no standard that says that a cached response for https://foo.org/x can be used for a request to http://foo.org/x.


Those would be different URIs, and thus different URLs, and thus have different caching policies. URIs (and in turn URLs) must be consistent for caching to apply.

Consider:

  https://foo/a.txt !=  http://foo/a.txt
just like the obvious case of

  https://foo/b.txt != http://bar/c.txt
All of those are considered unique URIs.


It would be interesting to see a comparison of how common and soon-to-be-common browsers [FF3/3.5/4, IE6/7/8/9, Chrom(e|ium), common mobile browsers, ...] deal with HTTPS content by default and in response to relevant headers. No persistence, short-term persistence (not re-requesting objects currently in use elsewhere in the current document and/or other open windows/frames), up-to-session-long persistence (RAM cache), or long-tern persistence (disk cache).

I suspect there will be quite a range of behaviours, especially if you consider IE6 (which unfortunately I have to, as do many others) so a bit more consideration is needed before jumping to change expectations of static content access speeds.

Another bit of my research to add to the list of things that'll get looked into when I have some free time (i.e. when hell freezes over)...


... um, why is marketer's comment here dead?


My guess is the multiple links using url shorteners tripped an anti-spam algorithm somewhere.


Sigh, you shouldn't be relying on Google for this anyways. No company should be hosting files critical for your website except you.


There are good reasons to use the Google version.

A: Google is likely far more reliable than your dinky little outfit, so odds are very very high it'll be up and absurdly reliable.

B: If you use the Google version, it's probably already in your user's cache saving you nearly an 80k download for a common library and making the first hit to your site potentially much faster.


A: Google is likely far more reliable..

It's not about reliability. It's about trust. When you cross-site script yourself by giving Google access to the contents of every page on your site, you are entrusting this company with your data, and all your customer's data.

No company, including almighty, do-no-evil Google should be trusted this much.

B: If you use the Google version, it's probably already in your user's cache saving you nearly an 80k download

This might be offset slightly by your web browser having to make a separate HTTPS connection to a different "secure" host. If you are this concerned about JavaScript load times you should bundle all required javascript into a single file -- one HTTP request to one server will always beat many requests to many servers.


No company, including almighty, do-no-evil Google should be trusted this much.

It isn't just about trusting the CDN: relying on popular public static resources like this increases your vulnerability to DNS poisoning attacks.

If some malware manages to redirect requests for Google's static content servers to their servers they could inject a key-logger or username/password/credit-card info scanning code into every site (even small and/or low profile sites that would otherwise not be as likely to be targeted) using that as a source for libraries like jQuery that the infected users visit.


Fair points.


The article mentions that the HTTPS link to the resources is now default on Google's list of AJAX resources. I remember back in the day being warned that SSL connections have an appreciable overhead and so you should avoid SSL use unnecessarily. Is this no longer true? Has the overhead of https:// become negligible?


Yes it has become negligible. Google themselves have a nice talk about it. The major overhead is in the initial handshake. Google themselves have published how the major issues were in browser behavior (caching mostly) and not connection speed or cpu cost.


Could you provide a link to that talk?


For those that don't know what CDN is here supposed to do:

http://blog.patrickmeenan.com/2010/03/google-to-offer-cdn-se...

If I understood it correctly, the idea is to refer to the web resources in html in such a way to let the clients access them with the minimal possible latency, no matter where the clients are.


does anyone know if it's possible to use the Google Picasa CDN with ssl?


Yes, if by "caching" you mean and only mean "caching on RAM".




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: