In addition, having Expires set to a date in the past in not the same as "Cache-Control: no-cache, private". The latter instructs CDNs not to cache the file whereas the former doesn't (CDN is allowed to cache the file and revalidates with the origin).
Your use of the term "attack" seems to imply that a malicious client can trigger circular request loops by using a cleverly forged request. But I cannot understand how it could happen, unless the proxy servers are misconfigured. Am I missing something?
The link above describes numerous proxies as not so much misconfigured as miscoded. That is, no configuration should cause a proxy to not apply a Via or to ignore its own Via. Presumably the attacker would be a malicious proxy customer rather than a malicious client. If the proxy customer is not considered as a monolith, then actually control of just one proxy's configuration is enough to turn a chain into a loop.
The threat is a DOS against a group of CDNs as a whole. Particular CDN customers are only vulnerable to the extent that they require an affected CDN's services. If the CF link isn't clear, click through to the paper they reference:
That's not the point here. It's an adversary getting two CDNs to loop eachother and launching an attack that way.
If CDN A proxies requests to CDN B and CDN B proxies requests to CDN A then those two will DoS eachother fairly quickly.
There is no attacker inbetween to strip the Via, that would be counterproductive to the attack.
Email has this too; the Received: header. If you manage to get a loop between two MTAs going they will detect it by seeing themselves in the Received: header list.
If other CDNs are removing your Via header, then other CDNs are the adversary, but now we're in Crazy Town because in that case they are DOSing themselves as much as they are DOSing you. The threat discussed here is from malicious CDN customers.
What a terrible stance for a company like Fastly to take:
More debatable perhaps is Via, which is required (by RFC7230) to be added to the response by any proxy through which it passes to identify the proxy. This can be something useful like the proxy’s hostname, but is more likely to be a generic identifier like “vegur”, “varnish”, or “squid”. Removing (or not setting) this header is technically a spec violation, but no browsers do anything with it, so it’s reasonably safe to get rid of it if you want to.
Actually, it isn’t “debatable,” since the debate occurred, and a decision was made, and published. That’s what RFCs are for.
To ignore them with such wanton disregard speaks volumes.
Edit: to clarify, I didn't mean that RFCs should not be debated at all, only that disregarding this because "no browsers do anything with it" didn't seem like a good justification or stance.
Not really. Standards are nice, but as time goes on, things change, and we should NEVER only change things 'once a standard says so'. The web is an ever evolving platform, and standards are loosely respected these days anyway. Heck, browsers aren't a standard themselves!
By that logic there isn't any point to standards at all. If we all are supposed to ignore them when we feel like it then what's the point of having them at all?
If there is a standard published for something, follow it or publish your own RFC. Don't just nitpick the bits you want and break clients in the process.
They're provided as guidance. They aren't some kind of internet law. Sometimes contravening standards is harmful; sometimes it's helpful. It's not productive to point at them as if they were dispositive in debates.
Nope. If web standards worked, the benefits would be a stable platform where you can easily create new software that both produced or consumed content without breaking anything.
Of course that isn't what is happening at all. Instead we're having the usual heap of politics and ever-faster update cycles. So I'd agree to say that web standards failed - but not that they were meant as a guidiance in the first place.
There is a bit more to it though: When a vendor declares compliance to a standard and fails to implement it correctly, then the vendor can be held accountable, and a customer is in a much better position to negotiate a correction. For this reason, standards are also important from a legal perspective.
I do disagree with the parent's wording that it isn't "debatable". At the same time, I think the point trying to be made is that the article disregards the debates made for it. This seems to show itself in the fact that the article talks mainly of how browsers do not use Via. The problem being that the debate around the RFC was for an entirely different use case. As per the RFC spec, a lot of it was around protocol capabilities.
Thus it may not be useful to the browser. But the article saying that its usage is debatable in this context is very wrong.
So then re-write and submit an RFC...or go to IETF and join a WG - be open to discussion and speak up about what needs changed.
A MUST is a MUST, in my opinion - and too often there are serious issues in (web) communication because people ignore them as they see fit.
In either event - no one is saying that you should wait to change an RFC (or wording in an RFC) until it's fully deprecated and completely not in use - but a lot of people use RFC's for researching issues, especially in areas they are not 100% familiar with. Coming across a MUST and seeing that some software vendor or hardware vendor doesn't follow it is all too common. This delays certain projects by weeks, sometimes longer, and costs the involved companies lots of money. RFC's exist for a reason...because the approved standards are just expected to be followed when creating new things.
Some things are just wrong. I've implemented SIP, a horrible standard. Lots of compatibility issues just from their insistence on a "human friendly" text format alone.
At any rate there's lots of things you just have to ignore, drop, reject, and otherwise muck about with in order to run a sane network. These standards are not written with software experience. They're written much in a vacuum and out of touch. This varies widely across RFCs so it might not apply to RFCs you like.
Example of a MUST for SIP and HTTP: line folding and comments in headers. Apart from being crap for performance (so much for being able to zero-copy a header value as just a pointer+len) there's zero legitimate use for these "features" of the syntax. Simply rejecting such messages is in your best interest as a network operator.
This specific point got me as well: I can't really get behind a company - especially a CDN company - telling people to break the specific because "no browsers don't do anything with it." That is, honestly, really bad advice and I can't take any of their other points seriously.
I think this blog is more of an editorial piece considering that fastly will insert two 'Via: 1.1 varnish' with shielding enabled plus a range of additional X- headers. :)
I actually read this whole thing as: we don't disable this ourselves. But if you're using us as the last layer before the user agent, you can dump this header. And our Varnish language makes it easy.
Right - then IETF adopts approved RFC's and publishes them as internet standards. What we're discussing in this thread are accepted / approved internet standards - not just some random RFC that some dude tossed up on a work group.
It’s not like there is some kind of internet police that’ll come after you if you don’t follow the RFCs, whether IETF approves them or not. You can decide to follow them all the way, partly or not at all.
Or you can debate them as much as you want, or even publish a new and improved version and perhaps people will decide to follow that instead. Or perhaps they’ll just do whatever they feel like.
· Apache's mod_deflate doesn't do this (thankfully).
This has an immediately negative impact on performance and, in many cases, cost: the origin server is sending more bytes over the wire, and network transit is often a non-trivial cost for those on AWS, Azure, et. al.
Note: I used to work at Cloudflare, and believe they (we!) made the right decision here. There are other mechanisms that can be used to detect proxy loops, and there are also cases where customers may "stack" edge network vendors (migration, specific feature needs, application complexity).
Very interesting link, thanks. I'm not too familiar with this area, but from my understanding of the article, Cloudflare are suggesting that all players in the game need to be compliant otherwise nobody wins
So is this Fastly article suggesting a different point of view?
The article mentions that Via is useful while the request is bouncing around among proxies, but isn't useful in responses, which is what the article is about.
They're talking about responses in which Via is technically 'required' but pretty useless. The blog post you linked seems to be about the use of the header in requests.
Saying that a header is useless because it has been deprecated and displaced by a newer header is... misleading at best.
If all you ever code for is the latest version of Firefox and Chrome, you might not understand this, but there's a whole world out there with an astonishing diversity of browsers. (Also, your site is bad and you should feel bad.) Removing X-Frame-Options without first checking if 99.99% of your users' browsers support Content-Security-Policy is just asking for increased risk.
Most of the suggestions in this post are great, but as always, especially when security is involved, you need to assess your business needs yourself.
The suggestion to use Content-Security-Policy over X-Frame-Options is great -- if you don't expect many of your users to be using IE-based browsers. If you're primarily serving large enterprises or government customers though, it's likely that most of your users will still be coming from a browser that doesn't support Content-Security-Policy.
Not to mention that Content-Security-Policy can be costly to set up and maintain properly. My servers send both X-Frame-Options and Content-Security-Policy, but I do keep running into cases where my CSP was too restrictive and have to keep fiddling with it.
P3P is unnecessary until you have clients complaining that Internet Explorer users cannot use the site and it's hurting their business. I speak of experience.
Curiously enough, P3P enforcement depends on the operating system and not on the browser. Internet Explorer 11 may or may not care about P3P depending if you're on Windows 7 or Windows 10.
Came here to say the exact same thing. P3P may be "officially" obsolete, but if your business wants older browsers to be able to handle your code, you're going to have to deal with it.
If you have the misfortune of encountering it, you can get really hard to detect bugs with ajax calls or script files not getting loaded in IE when you don't have P3P set up correctly. (for instance: https://www.techrepublic.com/blog/software-engineer/craft-a-...)
cache-control doesn't completely replace Expires for some use cases.
If you have a scheduled task that generates data every hour, you can set Expires accordingly so all clients will refresh the data as soon as the hour rolls over.
You can do this using max-age but then you have to dynamically calculate this header per request which means you can't do things like upload your data to s3 and set the cache-control header on it.
With expires, I can upload a file to s3 and set
Expires: ... 17:00
and then not have to touch it again for an hour.
you can work around this client side with per hour filenames or the other usual cache busting tricks, but that's annoying.
I get your point but it's such a niche use case that I can't see it coming up in real world situations. I mean, "never say never", but it's a solution that creates as many problems as it solves.
I used to build online games that fed off real world events. Eg football managers based on real football matches, games based on horse racing, F1, tour de France, and many others. We needed to change feeds when the match started and ended, but sometimes events are delayed or run into extra time. So we needed a way to change that quickly. We also needed to present different screens at the start and end of the event to the live scoring during the event. This all meant it was easier handling times based cut offs in JavaScript with the live scoring JSON files (which were being fed from S3) using cache control header because it was easier to set an X seconds into he future time out for that than rewriting the S3 tags every few seconds with a new expires header.
On paper our use case should be precisely what you described but even we found expires to be unnecessary.
It seems like kind of an unlikely scenario that you'd want to expire content at a specific time. I mean, if someone chooses to do that, they better know what the impact could be.
With the Expires header, all clients that retrieved that content would expire at the exact same time, which could cause some disproportionately high load in the few seconds after that (the "thundering herd" problem). The Cache-Control solution will stagger the expirations (relative to when the client last retrieved it) so the server doesn't get trampled.
That's a cynical view, and I don't think I said you should depend on Cache-Control working. Yes, there will be bad actors, but the majority of clients are good actors. It's just one of several measures you should take to even out the load.
Of course you'd want a caching layer in front of the server doing the actual work, but it's still possible to "thundering herd" the cache server if you use an Expires header. Even if the herd doesn't hurt your backend server, it can still make the load on your caching frontend servers spike at specific time periods with every good actor refreshing the content at the same time. So it's still ideal to try and even out that load with Cache-Control.
The use case of having hourly updated data (e.g. weather data) on an S3 bucket behind a CloudFront distribution is not that niche.
Thundering herd may or may not be an issue depending on the amount of traffic you normally get, the architecture of your backend (e.g. AWS Lambda or S3 which can most likely deal with this easily) and the primary purpose of your CDN usage (e.g. caching data closer to the users for faster delivery world wide rather than reducing back end load).
I really wish the browser vendors would come together to establish a plan to clean up User-Agent. It's one of the worst offenders in header legacy[1] and fingerprinting. Exposing what browser I am using and it's major version is fine but I don't think every website I visit deserves to know what OS I am using, nor the details of my CPU.
The requires more steps and a slower process. User-agent is a one step process. Browser capabilities means returning something back to the browser and potentially coming back to the server.
While it has obviously been abused, neither way is ideal. There's no way for a server to say "tell me the browser capabilities before I serve you the request".
These days the referrer header rarely makes it through for 2 main classes of reasons [0].
1. Requests transiting across HTTP <-> HTTPS boundaries do not include the referrer header.
2. The referrer header is frequently disabled by sites (especially search engines and high-traffic sites) through the use a special HTML header meta control tag [1]:
<meta name="referrer" content="no-referrer" />
Worry not, though. When client-side Javascript is enabled, ga.js still sends enough information that Google can reconstruct most of everyone's browsing sessions on their backend. Now Google (and only Google) really has all your / our data (generally speaking). :-\
I used to spoof my user-agent and don't remember much of a difference... As a dev, everyone tells me I should just throw literally every possible version of newer attributes into the CSS anyhow, so on most websites you're bound to get at least some of the right ones.
Perhaps your complaint is of a higher order though? Recently I've been spending most of my time wrestling with CSS so my perspective is a bit skewed...
for instance, just found today that GitHub code reviews require the Referer header to allow PR comments. Without the Referer header, GH returns `422 Unprocessable Entity`
server is no vanity, server is needed to know WHO THE HELL responded you (we are in a very messy cdn selectors + cdns + application layers depending on non obvious rules on (sub)domain and cookies).
Speaking of HTTP headers. One I wish more people would use is Accept-Language instead of region/geoip based localization. Practically every site I've come across ignores this header in favour of geoip with the weird and notable exception of Microsoft exchange webmail and Grafana.
Yes, please! Is there some catch I don't know why people aren't relying on the header to determine the language served? Because if not I don't get how geoIP/region is used so widely.
I get that this is data that Fastly has to send but doesn’t get to bill directly to customers, but don’t expect ME to care about this until the average news article stops sending me 10 MB.
You seem to be taking this way too critically. It's a simple article that's looking at the typical headers in responses and showing which ones probably are outdated or unnecessary. If you have 10 hits per day, it doesn't matter. For others that send billions of requests, it might just make a material difference.
I wouldn't trust this entry at all. The author did not do proper research to understand the why's behind the headers that he didn't understand or didn't know well enough.
They list "date" as being required by protocol. This is not true. The term used in the RFC is "should". It is a nice to have, for additional validation by proxies.
The term the RFC (RFC 2616, Section 14.18) uses is "MUST" with 3 exceptions (HTTP 100/101 responses, which are message-less; HTTP 500-class errors which are indications that the server is malfunctioning and during this malfunction it's inconvenient to generate a date; and finally HTTP servers without clocks), which are all referencing exceptional cases -- in general HTTP/1.1 responses MUST include a Date header from the Origin server, and proxies MUST add the Date header if the Origin server failed to do so (due to 1 or more of the 3 exceptions).
Proxies used to (and some still do) compare last-modified and date, if the date header is present. [0] They are not required to trust this header as accurate.
For reference around and clarification around the Date header, the "should" comes from the loophole that nobody is required to have a time source. The previous RFC's made that harder to understand, as the loophole was in another section.
I believe you have mis-interpreted Section 7.1.1.2 of RFC 7231, specifically it is identical to RFC 2616 Section 14.18 in that a Date header MUST be included except for 3 exceptions. They have listed the 3 exceptions first and also the wording that includes "SHOULD" which defines when the origin server should compute the date, but notwithstanding those notes it still notes that the Date header is mandatory for an origin server: "An origin server MUST send a Date header field in all other cases." (where other refers to the 3 exceptions -- HTTP 500-class errors; HTTP 100-class message-less responses; and no-clock systems)
A time source isn't required, a clock is. Further, if the origin server does not have a clock, any proxy (such as HAProxy) is still required to add the Date header if it has a clock, as if it were the origin server. In practice, there are very few functional systems without clocks.
I'd imagine its original design was so that the proxy could choose to honor the response date, rather than just use the current time - technically speaking the date header removes state from the proxy, as the proxy doesn't need to know what time it is to honor cache policies.
Oh God. No. Expires and Pragma are absolutely essential if you're writing a web app to be used by folks stuck behind a walled garden proxy implemented in the dumbest way possible.
Not only ie6 but ie11 on windows 7 as well.
Windows 10 ie11 does not care about p3p.
Have fun debugging your ajax stuff on that one if those stupid headers fail.
Luckyly one can just fill that p3p header with garbage and the ie will just gobble it up be happy.
Go figure...
Healthcare and government (US). So, so very many systems are on IE6. So, so very many websites only work correctly/fully when end users are on that platform. Until you've had to support code distributed by the US federal gov't and watch the percentages of users hitting your site from XP (or earlier) UAs rise to the double digits, you have not known sadness.
It would be helpful to have a guide to this for people running a 'low audience website' where there is no CDN or Varnish, just some Apache or Nginx server on a slow-ish but cheap VPS.
For a local business or community, e.g. an arts group with a Wordpress style site, there are many common problems, they might not need a full CDN, just serving media files from a cookieless subdomain gets their site up to acceptable speed cutting the header overhead considerably.
Purging the useless headers might also include getting rid of pointless 'meta keywords' and what not.
The tips given here could be really suited to this type of simple work to get a site vaguely performant. How to do it with common little guy server setups could really help.
Realistically, how much traffic is saved by cutting headers? A simple article like [this](https://tp69.wordpress.com/2018/04/17/completely-silent-comp...) (currently on the HN frontpage) weighs 178 KB, and that's without external resources. Unused headers account at best for 0,1% of the total traffic.
One could argue that the headers comprise a very important 0.1%, but any wasted time the client spends waiting for and parsing headers will almost always be utterly dominated by the unavoidable wait for HTML parsing,
JavaScript parsing, painting and so on.
I could see the argument for pruning useless headers if, say, the method for generating them relied on some high-latency database call or filesystem access, but that would rarely be the case.
The details are interesting but "adds overhead at a critical time in the loading of your page" ... this seems pretty unlikely to have any noticeable processing overhead. Doing things better is generally good, but this all seems very low impact.
Depends on where you measure it. A client on a decent connection will never notice. If you're serving billions of hits, 20 bytes in a header is something you will definitely notice on your bandwidth bill.
That's a very weak argument to make though when regarding the modern web; sites are routinely sending me multiple megabytes of content; optimising 20 bytes isn't going to make even the smallest dent if you're trying to pack your site down.
I got stuck with a website once that was using one of the compression headers - maybe content-encoding to indicate that it's .gz files were gzipped even if the client didn't indicate it supported it. Some browsers would ignore it and just download the file, but others would unzip it. So you got a different file depending on what browser you used! I think wget and chrome behaved differently from each other. I wrote to the site operator who corrected it.
Bit of a tangent, but Fastly's CTO gave a terrific talk I attended about a year ago, titled something like "Why load balancing is impossible". My career in consulting has led to a gradual diffusion from my earlier focus on front-end performance optimization, but Fastly retains credibility in my book on a number of fronts.
You must have read a completely different article than I did. The one I read was providing a useful resource on obsolete/insecure/dubious but still widely-used HTTP headers.
You seem to be making assumptions about the motivation for the article and then reacting strongly against it, but that's also dubious.
I think you may be taking the headline too seriously; I think the OP is really arguing that these headers constitute a reasonable amount of bandwidth, and maybe we should just switch them off if they're not providing any value?
No, they are getting too many small customers and too many small customers ( see the part where now with $500/mo commit you can get 20-30% discount of the list without squeezing them hard) vary on too many headers which means it is blowing holes in their caches, which is making their Varnish-as-a-Service not work as well as it used to.
So it seems they are starting to fall into the propaganda mode to paint over the issue rather than admit that it is time for them to start innovating again.
Surrogate keys and quite cache busting used to be Fastly special sauce but since 2014 it is rather standard.
This is one of the most aggressively worthless posts I've ever read.
This is literally nothing more than a minor blog post that points out that some of us are still using headers we might not need to. Finding anything else in that is utterly baffling.
A very cheap attack is to chain CDNs into a nice circle. This is what Via protects against: https://blog.cloudflare.com/preventing-malicious-request-loo...
Just because a browser doesn't use a header does not make the header superfluous.