Hacker News new | past | comments | ask | show | jobs | submit login
HTTPWTF (httptoolkit.tech)
845 points by pimterry on March 4, 2021 | hide | past | favorite | 138 comments



I'd add to this list:

Chunk extensions. Most people know HTTP/1.1 can return a "chunked" response body: it breaks the body up into chunks, so that we can send a response whose length we don't know in advance, but also it allows us to keep the connection open after we're done. What most people don't know is that chunks can carry key-value metadata. The spec technically requires an implementation to at least parse them, though I think it is permitted to ignore them. I've never seen anything ever use this, and I hope that never changes. They're gone in HTTP/2. (So, also, if you thought HTTP/2 was backwards compatible: not technically!)

The "Authorization" header: like "Referer", this header is misspelled. (It should be spelled, "Authentication".) Same applies to "401 Unauthorized", which really ought to be "401 Unauthenticated". ("Unauthorized" is "403 Forbidden", or sometimes "404 Not Found".)

Also, header values. They're basically require implementing a custom string type to handle correctly; they're a baroque mix of characters & "opaque octets".


Those key value pairs in chunked encoding ("chunk extensions") are spec'ed to only be hop-by-hop, which makes them more or less completely unsuitable for actually using by end applications. Any proxy or reverse proxy are allowed to strip them. Indeed it can be argued that a conformant proxy is required to strip them, due to MUST ignore unknown extensions value requirement. (I suspect most do not strip them, and there is an argument to be made that blindly passing them through if not changing encoding could be considered ignoring them, but I'm not certain that is actually a conforming interpretation).

Plus surely there are many crusty middleboxes that will break if anybody tried to use that feature. Remember all the hoops websockets had to jump through to have much of a chance working for most people because of those? Many break badly if anything they were not programmed to handle tries to pass through.


> Those key value pairs in chunked encoding ("chunk extensions") are spec'ed to only be hop-by-hop, which makes them more or less completely unsuitable for actually using by end applications.

Oof, I hadn't mentally connected those dots, but you're completely right. (As Transfer-Encoding is hop-by-hop, not end-to-end…)


Chunked encoding is used a lot in DNS over HTTPS servers, and it's a real pain to parse them.

Lots of servers I've encountered with my browser stealth actually violate the spec and send "lengths" that do not match the sent payload lengths afterwards. Some reverse proxies also mess up the last chunk, so they're violating the spec there, too, and send a chunk with a negative length...and the spec doesn't even define how to handle this. I've also seen servers send random lengths in between, but without a payload that follows.

I would also like to add range requests (206 partial content) here. In practice, it's totally unpredictable how a server behaves when requesting multiple content ranges. Some reply with no range at all, even with correct headers. Some reply with more or less ranges than requested. Some even reply with out of boundary ranges that are larger than the content length header of the same response because they seem to use a faulty regexp on the server side.

It's a total shitshow.


"Chunked encoding is used a lot in DNS over HTTPS servers, and it's a real pain to parse them."

Always wondered if developers found that easy.

I just strip out the chunk lengths with a filter, suitable for use in UNIX pipes. It's like three lines in flex. I have always been aware of the different things that servers could "legally" do with chunking from reading the HTTP/1.1 spec but as the parent says no ever does anything beyond the basic chunk lengths. For example, how many servers support chunked uploads.

With the filter I wrote, as crude as it is, I have never had any problems. Works great with HTTP/1.1-pipelined DoH responses.


> send a chunk with a negative length...and the spec doesn't even define how to handle this

Hmm. I suppose it isn't explicitly called out, but I think it's fair to say that such a request is a 400 Bad Request, as it doesn't match the grammar. (There's no possibility for a negative chunk length, as there's no way to indicate it.)


Also, header values. They're basically require implementing a custom string type to handle correctly; they're a baroque mix of characters & "opaque octets".

You are supposed to treat all of them as "opaque octets"... or something like this might happen:

https://news.ycombinator.com/item?id=25857729


You can't. At some point, you have to actually make use of the headers, and some of those uses require decoding to a string. There is some wiggle room here, such as doing things like,

  header_as_raw_bytes == b"chunked"
which I would argue is still decoding the header: your language of choice had to encode that string into bytes in some encoding in the first place, so even though you're comparing the encoded forms, there's still a character encoding at work.

But, some of the headers are case-insensitive. E.g., Content-Type, Accept, Expect, etc.

That golang bug is precisely not treating the non-characters (the "opaque octets", as defined by the standard, that is, the octets that form obs-text) as if they were characters. You won't hit that bug in the safe subset, presuming you're implementing other parts of the standard correctly. (Which is… a huge assumption, given HTTP's complexity, but that's sort of the point here.)


Also the ridiculous User-Agent header which everyone spoofs.


I heartily endorse surfing as Googlebot. It’s often a whole different web.


Oh I have to try this now. Thanks.


How exactly? What's the difference?


I used this tons for MJPEG streams.


I want to argue for the use of "Authorization".

What you pass in the "Authorization" header is an user identity, which is established through authentication. And the server uses this identity to decide if you are authorized.


I've been searching for a while for a good way to know whether a client has disconnected in the middle of a long-running HTTP request. (We do heavyweight SQL queries of indeterminate length in response to those requests, and we'd like to be able to cancel them, rather than wasting DB-server CPU cycles calculating reports nobody's going to consume.)

You can't actually know whether the outgoing side of a TCP socket is closed, unless you write something to it. But it's hard to come up with something to write to an HTTP/1.1-over-TCP socket before you respond with anything, that would be a valid NOP according to all the protocol layers in play. (TCP keepalives would be perfect for this... if routers didn't silently drop them.)

But I guess sending an HTTP 102 every second or two could be used for exactly this: prodding the socket with something that middleboxes will be sure to pass back to the client.

If so, that's awesome! ...and also something I wish could be handled for me automatically by web frameworks, because getting that working sounds kind of ridiculous :)


Actually using HTTP 1xx will trigger hideous bugs, and worse those bugs will be timing-dependent. There is exactly one (non-websocket, which isn't really HTTP just pretends to be) codepath used in the wild: preapproval for a large file upload.

This problem is one reason why success/error should NOT be the first thing to send. It should be a trailer.

(HTTP/HTML tendency to substitute the response body for a human-visible error would require another mechanism to "reset" the response body.)


> codepath used in the wild: preapproval for a large file upload

There is no current browser, or client that by default will send an Expect: 100-Continue.

cURL removed it because it was too often broken. See https://curl.se/mail/lib-2017-07/0013.html

As of right now, while server authors will continue to need to support it, it is unlikely that it is a well tested code path, and it will likely break in weird ways even trying to use it.

So pre-approval for a large file upload is not even valid anymore.


Alright, make that the only codepath ever used in the wild. Even more reason to avoid 1xx.


If using TLS, couldn't you send a zero-length application data fragment?

The TLS 1.3 spec states "Zero-length fragments of Application Data MAY be sent, as they are potentially useful as a traffic analysis countermeasure."

I guess that tls libraries wouldn't expose an api to do that which is problematic for this approach.


Modern browsers reuse connections for subsequent requests so the connection may not be closed promptly, so this approach can't be relied upon anyway.


> You can't actually know whether the outgoing side of a TCP socket is closed, unless you write something to it

Wouldn't setting appropriate net.ipv4.tcp_keepalive_* and trying to read work?


> net.ipv4.tcp_keepalive_

Like I said:

> TCP keepalives would be perfect for this... if routers didn't silently drop them.

There are lots of middleboxes that don't pass along empty TCP packets. TCP keepalive is in a similar situation to IPsec: great for an Intranet, or for two public-Internet static peers with a clear layer-3 path between them; but everything falls apart in B2C scenarios.

Plus, to add to this problem: HTTP has gateways (proxies et al.) Doing TCP keepalive on the server end, only tells you whether the last gateway in the chain before the server is still connected to the server, rather than whether the client is still connected to the server.

Unless you can get every gateway in the chain to "propagate" keepalive (i.e. to push keepalive down to its client connection, iff the server pushes keepalive down onto it), silent undetected TCP disconnections will still happen—and even worse, you'll have false confidence that they aren't happening, as all your sockets will look like they're actively alive.

For what I'm doing, the client end isn't likely to have any gateways, so TCP keepalives "would be" workable for my use-case if not for the middlebox thing. But in full generality, TCP keepalives aren't workable, because there's always those corporate L7 caching proxies + outbound WAFs messing things up, even when L4 middleboxes aren't.

Keep your TCP keepalives for running connection-oriented stream protocols within your VPC. For HTTP on the open web, they're pretty unsuited. You need L7 keepalives. (If you've ever wondered, this is why websockets have their own L7 keepalives, a.k.a. "ping and pong" frames.)

> and trying to read

An HTTP client connection can legally half-close (i.e. close the output end) when it's done sending its last request; and this will result in a read(2) on the server's socket returning EOF. But this doesn't mean that the client's input end is closed! You have to do a write(2) to the server's socket to detect that.

And, since empty TCP packets aren't guaranteed to make the trip, that means you need to write a nonzero number of bytes of ...something. Without that actually messing up the state-machine of your L7 protocol.


You can do this with TCP keepalives [0]. Under Linux, a process can enable the SO_KEEPALIVE option on sockets to request that the Linux kernel send TCP keepalives periodically. A kernel option determines how frequently the kernel sends TCP keepalive packets. It is a single option that applies to all sockets of all processes. One can also configure the OS to enable SO_KEEPALIVE by default on all sockets of all processes.

Golang's GRPC library implements keepalives at the GRPC protocol level [1]. It provides a `Context` value [2] that code can use to detect peer disconnect and cancel expensive operations.

Golang's HTTP server API does not provide any way to detect peer disconnect before sending the final response [3].

Rust cannot set SO_KEEPALIVE [4]. One could possibly implement keepalives by writing zero-length chunks to the socket.

Java's Netty server library can set SO_KEEPALIVE [5]. One can then code a request handler that periodically checks if the socket is connected [6] and cancels expensive operations. Unfortunately, there is standard tooling to do this.

EDIT: You did mention TCP keepalives. I was not aware that some routers drop them. Can you link to any data on the prevalence of tcp-keepalive dropping for various kinds of client connections: home router, corporate wifi, mobile carrier-grade-NAT?

[0] https://tldp.org/HOWTO/TCP-Keepalive-HOWTO/overview.html

[1] https://pkg.go.dev/google.golang.org/grpc/keepalive

[2] https://pkg.go.dev/google.golang.org/grpc#ServerStream

[3] https://pkg.go.dev/net/http#HandlerFunc

[4] https://github.com/rust-lang/rust/issues/69774

[5] https://netty.io/4.1/api/io/netty/channel/ChannelOption.html...

[6] https://netty.io/4.1/api/io/netty/channel/Channel.html#isOpe...


I can't offer any data myself, but I can suggest that chasing up the reason that more "modern" runtimes like Go's and Rust's don't bother to expose SO_KEEPALIVE support — i.e. the discussions that ensued when someone proposed adding this support, as they certainly did at some point — would be a good place to find that data cited.

I can point out the obvious "analytical evidence", though: note how all the platform APIs that did expose SO_KEEPALIVE are from the 90s at the latest — i.e. before the proliferation of L4 middleboxes. And note how modern protocols like Websockets, gRPC, and even HTTP/2 (https://webconcepts.info/concepts/http2-frame-type/0x6) always do their own L7 keepalives, rather than relying on TCP keepalives — even when there's no technical obstacle to relying on TCP keepalives.


> Golang's HTTP server API does not provide any way to detect peer disconnect before sending the final response [3].

Iirc, the context mechanism can be used to detect the client's disconnection or cancellation of the request in some cases. From [1]:

> For incoming server requests, the context is canceled when the client's connection closes, the request is canceled (with HTTP/2), or when the ServeHTTP method returns.

[1]: https://golang.org/pkg/net/http/#Request.Context


Good. Thanks.


Another thing to note about custom headers is that when used in an XHR (eg: X-Requested-With), they will force a preflight request (with the OPTIONS method). If your webserver isn't configured to handle OPTIONS and return the correct CORS headers, that will effectively break clients.

Best to just never use custom headers.

I've written more about this here: https://developer.akamai.com/blog/2015/08/17/solving-options...


Yep, you've got to be careful with browser HTTP requests! Conveniently on this very same site I built a CORS tool that knows all those rules and can tell you how they work for every case: https://httptoolkit.tech/will-it-cors/


I see this a lot as an anti CSRF technique in AJAX based SPAs.


yeah, those techniques predate CORS, but even back then, you'd typically add your anti-csrf token to the payload rather than the header. CSRF is application level logic rather than protocol level.


> they will force a preflight request

That's why they're so great. use a custom header and never worry about CSRF issues.

Use custom header and be sure that if request comes from the browser it was made by legitimate code from your origin.


I’m guessing based on the username OP is the original author, caught a typo that could trip a novice up if they’re reading :

This becomes useful though if you send a request including a Except: 100-continue header. That header tells the server you expect a 100 response, and you're not going to send the full request body until you receive it.

I’m guessing that should be Expect?

Overall interesting article, thanks for writing it!


Good catch! Thanks for that, now fixed.


Oh, hey Tim! Hope life is treating you well!


Haha, hey Will! The internet is a small world :-)


In the same section, there's also a reference to the 101 status instead of 100.


Another good spot, now also fixed, thanks!


Another one is that it's technically valid to have a request target of '*' for the HTTP OPTIONS request type. It's supposed to return general information about the whole server. You can try it out with e.g. `curl -XOPTIONS http://google.com --request-target '*'`

Nginx gives you a 400 Bad Request response, Apache does nothing, and other servers vary in whether they return a non-error code.

https://curl.se/mail/lib-2016-08/0167.html

https://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html


I’m trying this out of curiosity, but getting (e.g.):

  HTTP/1.0 400 Invalid HTTP Request
My mistake, or are there other working end-points out there (I tried google, yahoo, and cbc.ca)?


You're not making a mistake, they just don't give an interesting response to this type of request. The only server I know of that actually uses it for something is Icecast (a music streaming server).


Ah!!! Perfect. I’ve got icecast in my life. Thx for the tip.


Since we're sharing our own WTFs;

You can include the same header multiple time in a HTTP message, and this is equivalent to having one such header with a comma-separated list of values.

Then there's WWW-Authenticate (the one telling you to re-try with credentials). It has a comma-separated list of parameters.

The combination of those two leads to brokenness, like how recently an API thing would not get Firefox to ask for username and password, because it happened to have put "Bearer" before "Basic" in the list.

https://tools.ietf.org/html/rfc7235#section-4.1


This article [1] is a really great read on some of the pitfalls you encounter due to the way duplicate headers are parsed in different browsers (skip to "Let's talk about HTTP headers" if you want to jump right into the code).

[1]: https://fasterthanli.me/articles/aiming-for-correctness-with...


And some headers have their own exceptions to this.

The Set-Cookie header (sent by the server) should always be sent as multiple headers, not comma separated as user agents may follow Netscape's original spec.

  https://stackoverflow.com/questions/2880047/is-it-possible-to-set-more-than-one-cookie-with-a-single-set-cookie

  https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Set-Cookie
On the other hand in HTTP/1.1 the Cookie header should always be sent as a single header, not multiple. In HTTP/2, they may be sent as separate headers to improve compression. :)

  https://stackoverflow.com/questions/16305814/are-multiple-cookie-headers-allowed-in-an-http-request


Fun Fact: When http first came out and was hidden in Academia, there were no headers. Then the need for metadata was realized and second-system syndrome took over to create a bunch of crazy headers. My favorite odd-ball header is "Charge-To:"

https://www.w3.org/Protocols/HTTP/HTRQ_Headers.html

This is not a custom X- header but an official header. Also Email: and some other odd headers were standardized at that time.


Referer being spelled wrong - I KNEW something was wrong about it every time I saw it but it never actually clicked.


I just figured it was one of those words spelt differently in American English, which most RFCs etc are written in. (British English native here.)


> (British English native here.)

That's why you spelled 'spelled': spelt :D


Har, this reminds me of an old bit that nicely distills both the mindset and spelling differences across the pond.

UK native, upon seeing the word "color" thinks: "Aha, US spelling!"

US native, upon seeing the word "colour" thinks: "Aha, a typo!"


I (not GP but also native BrE) reckon I use ~lled transitively and ~lt intransitively. I'm not aware of/haven't found anything to make a case for that though.

(Edit: or perhaps it's more about active/passive voice? Thinking particularly about burnt/burned.)

It's taken to an extreme by some thicker accented (dialected?) people around where I grew up though - an' so I turnt [turned] round (right-right round) an' said to 'im [...]!


It's a bit infuriating when English isn't your native language because I could never remember the right spelling.


I think I can take it on behalf of Americans and we'll just make "referer" a US spelling like we did with "thru".


The internet may help achieve what centuries of attempted English spelling reform has failed to do: simplification of spelling. There are two competing forces at play:

- readily available spell checkers

- the international interplay of ideas

The former stabilizes spelling, the latter says "screw it, through is spelled thru, I made up yeet, and emoji are valid grammar". But also grammar and spelling Nazis.

My money is on fluidity of language.


> HTTP 103: When the server receives a request that takes a little processing, it often can't fully send the response headers until that processing completes. HTTP 103 allows the server to immediately nudge the client to download other content in parallel, without waiting for the requested resource data to be ready.

Could someone explain why this needs a new status code at all? At the point where the new status code sends "early headers", the client was expecting the regular status code and headers anyway. Why could the server not simply do:

1) Receive request

2) Send 200 OK and early headers, but only send a single trailing newline (i.e., terminate the status line and last early header field, but don't terminate the header list as a whole)

3) Do the actual request processing, heavy lifting, etc

4) Send remaining headers, double-newline and response body, if any.

On the client side, a client could simply start to preload link headers as soon as it receives them, without waiting for the whole response.

This seems like it would lead to pretty much the same latency characteristics without needing to extend the protocol.

The only major new ability I see is to send headers before the (final) status code. But what would be the use-case for that?

Edit:

The RFC[1] sheds some light on this: The point seems to be that the headers sent in an 103 are only "canon" if they are repeated in the final response. So a server could send a link header as an early hint, then effectively say "whoops, disregard that, I changed my mind" by not sending the header again in the final response.

I still don't see a lot of ways a client could meaningfully respond to that, but I guess it could at least abort preloading to save bandwidth or purge the resource from the cache if it was already preloaded.

[1] https://tools.ietf.org/html/rfc8297#section-2


As with a 100, a 103 is tentative — it doesn't guarantee that the final result will be 2xx. This can happen if e.g. your web server is responsible for sending the early hints, before proxying to your app server.


Hmm, 'no-cache' meaning please cache it in reality may have been the problem in the past why those damn internet explorers cached ajax responses and the only way to solve it was to append a random query parameter?


Seems like one of those strings you could search Github for to find thousands of bugs at once.


Reading that, I finally understood so many hours of debugging throughout my life.


It sends a revalidation request, so unlikely in theory, but probably many [shared] hosting providers set up a forced cache on server side ... or otherwise the server-side was wrong. (After all if someone used no-cache instead of no-store maybe they mucked up something else too.)


I've been running public web servers for decades, and almost all of this was new information. Excellent article!

Fun fact, reddit used to have 'X-Bender: Bite my shiny metal ass' on every response. Sadly they seem to have removed it.


You're thinking robots.txt https://www.reddit.com/robots.txt and it's still there


Ah yes, bender was in the robots file. But we also had a funny X-header. Maybe you can find it in GitHub in the haproxy config.


Not there, but it does have "x-moose: majestic"


> User-Agent: bender

> Disallow: /my_shiny_metal_ass

This was there when I checked just now; was it removed and re-added?


Yes I saw that, but the parent comment said about it being a header ;)


The way I see it, the parent said bender is in robots.txt, and you said it's not, and then you also said x-moose was in robots.txt :)

You thought you were replying a level higher.



I thought it was Slashdot that had the X-Bender header?


I used to encourage back-end web developers to write a web server from scratch as a learning exercise. With HTTP 1.1 it was actually pretty easy to write one in C (plus berkeley sockets); the idea being that you learn a lot about how things actually work at the lowest level without spending an inordinate amount of time. It's not really practical with HTTP 2 anymore but in any case, having done my own exercise I had no idea about many of these quirks.


https://doc.rust-lang.org/book/ch20-00-final-project-a-web-s...

The Rust Book has an awesome "final project" where it walks you through building a multi-threaded web server. If you're a battle-hardened C/C++ dev looking for an inroad to Rust, this is a great place to start.


Never touched Rust but having skimmed through this, looks like a fantastic tutorial.


Thank you!


Seems like you are the author of the book. Just wanted to say that this book makes me want to pick up Rust even though I have no specific goal for it, because the book is appealing in writing and appearance, layout and illustrations, ideas and execution.. basically good job and thank you!


One of two authors. I’ll share this with my co-author, thanks a ton :)


I'm also here to worship your work! The Rust book is one of my favorite documentations around, and just the other day I sent it to a colleague who was interested in learning Rust. Even though he only had experience in Typescript and Java, he made a working chess engine less than three days later.


Other co-author here, that's great to hear!!


I'm the other co-author, thank you for sharing this! It means a lot to me <3


I also enjoyed the Rust Book. Thanks so much for the effort you and your co-author put into the book. And thank you for your contributions to the Rust language and tooling. I learned Rust over the difficult last year. It brings me some joy and satisfaction.


Glad you like the book and Rust!! <3


I teach web development and distributed systems in a local university, and one of my lab exercises is building an HTTP/1.0 Server in Python with sockets. I do have a blog post [1] that shows how to do it if someone's interested..

[1] https://joaoventura.net/blog/2017/python-webserver/


Some HTTP headers support extended parameters, parameters with a "" after them which allow character encoding of the header value, e.g. in UTF-8. Confusingly, they also support sending both regular and extended parameters in the same header.

  https://tools.ietf.org/html/rfc5987
E.g. sending the file "naïve.txt" using the Content-Disposition header.

  Content-Disposition: attachment; filename=na_ve.txt; filename*=utf8''na%C3%AFve.txt
https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Co...

  The parameters filename and filename* differ only in that filename* uses the encoding defined in RFC 5987. When both filename and filename* are present in a single header field value, filename* is preferred over filename when both are understood.


> X-Shenanigans: none - this appears on every response from Twilio's API. I have no idea why, but it is comforting to know there's definitely no shenanigans this time round.

This made me laugh.

Incidentally, reminds me of a company I used to work at where one of the devs thought it was hilarious to return 418 (I'm a teapot) [1] for all bad requests. Unfortunately sometimes these were actually 5xx-level errors so it quickly become annoying. The Twilio header listed above seems fairly innocuous though.

[1]: https://tools.ietf.org/html/rfc2324#section-2.3.2


This is both a great post and an effective ad - I've been looking for a lighter-weight Postman alternative (and HTTPie, while nice, is no substitute for a graphical UI for such a thing). Will check HTTP Toolkit out!


Thanks! It's a difficult balance to walk, I've taken to just trying to write great HTTP articles and ignoring the advertising angle entirely, seems to be working OK.

Do try out HTTP Toolkit and let me know what you think, but it's not a general purpose HTTP client like Postman or HTTPie. It's actually an HTTP debugger, more like Fiddler/Charles/mitmproxy, for debugging & testing. A convenient HTTP client is definitely planned as part of that eventually, but not today.


Ah, gotcha. I actually do have a good use case for that as well (and do think they could go together nicely someday), so I'll still check it out!


In case anyone wants a native (Cocoa) REST client for macOS, there's https://paw.cloud. It's paid, but they sometimes give away free licenses for retweets, which is how I got mine.


not that effective - i've also been on the lookout for this, but without your comment wouldn't have realized that's what this website was offering.


Seconded. This was a great blog post, I think a more visible plug at the end is more than justified.


Insomnia.rest


If I'm not mistaken insomnia is also using electron. I wouldn't really put it as a lightweight alternative to postman.


Lampooning the Cache-Control header is all fun and games, but remember it was designed in a time where Internet in big organisations often was behind a caching proxy like Squid. With that in mind, the explanations at https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Ca... make good sense.


No, not really - those explanations are nothing more than what's in TFA, and they don't help it make any more sense. There's no reason in the world why no-cache should mean 'cache this please but with caveats', when we have all of the English language available to come up with an alternative header.


I always liked this http caching article, done in a conversational tone: https://jakearchibald.com/2016/caching-best-practices/


Well, I can see how someone fixated on a "store" Vs "cache" terminology might arrive at that name.

Browsers store, proxies cache , so it should be no-cache, obviously!

Sure, it's stupid, but naming is hard and these things happen all the time.


Caching in this context means “no need to ask the server for a new copy of this within the cache lifetime”. no-cache then does what it says: You can store it if you like, but you need to check with the server before reusing it.

That might be a little counter-intuitive, but if you read the definitions of the words, it does make sense.


But there was must-revalidate too, right? What's the difference?


> must-revalidate, max-age=0 if server doesn't respond it may give you cached

> no-cache won't give you cached entry without validation

> must-revalidate, max-age=10 will revalidate only after the time has expired


no-cache: you MAY store this, but you MUST revalidate with the server before using it

must-revalidate: you MUST revalidate with the server before using this after it expires

They overlap but address different things; no-cache is for cacheability while must-revalidate is for validity. I don't think either of them are named very well.


Let me try this...

The "no-cache" was a hint not about caching the content but about caching subsequent requests, and could optionally specify specific fields that would indicate that a new candidate request needed to be sent to the server as the content might be different. There's this reality that just to render content, the browser effectively must have a cached copy of the content, so the notion that the response wouldn't be cached wasn't really even in the cards. Whether you used the cache or not was a decision made at the time you were sending a request, not when you were consuming the response.

The "no-cache" directive meant, "hey, don't check for a cached copy of the content, just go fetch new content". It was often used by analytics pieces so that the server could count how often content was looked at.

Back in the day you had terrible latencies (particularly over dialup). You also had issues with horribly asymmetric bandwidth that meant the data you sent could become the bandwidth bottleneck (outbound bandwidth constraints would mean ACK packets would get queued up, delaying downloads even when you had plenty of download bandwidth), and of course HTTP requests weren't terribly compact, so this could really make a big difference.

Caching requests was a big deal. Performance could be improved significantly by "cheating" and just not sending a new request, and this lead to some very aggressive caching strategies. The, "check if the content really is different, and just use the original copy if they aren't" hack a pretty common one. If nothing else, it saved the browser the overhead of re-rendering the page and the accompanying annoying user experience of seeing the re-render.

The original protocol didn't have any notion of no-store, and specifically mentioned that "private" didn't really provide privacy, but more that the content should be "private" in the sense that only the browser itself should store the content. Again, there's an assumption that the browser is going to put everything it gets into a "cache", because it has to.

You could use "max-age", but a lot of caches would still shove the object in their cache and only expire it on a FIFO basis or when a new request was to be sent (and it was vulnerable to clock skew problems). Sounds dumb, but it was the kind of dumb that kept code simple and worked pretty well.

So now that the practices were in place, you need a new directive to say, "hold up, that old approach is NOT a good idea here". So they came up with "no-store" as a way to say, "don't even put it in the cache in the first place".


NotOnly-cache?


They screwed this all up.

Private? No. Cache!


Woah I had no idea about these 100 responses. Looks like there are quite a few of them on the Internet:

https://beta.shodan.io/search/report?query=http.status%3A%3E...


"X-Clacks-Overhead: GNU Terry Pratchett - a tribute to Terry Pratchett, based on the message protocols within his own books."

I'll enjoy knowing this next time I reread Going Postal :)


Here's a site about it: http://www.gnuterrypratchett.com/ with snippets (etc) to configure it into your servers and apps.


With nginx it's as easy as:

    add_header X-Clacks-Overhead "GNU Terry Pratchett";
in your server{} block.


> X-Http-Method-Override - used to set a method that couldn't be set as the real method of the request for some reason, usually a client or networking limitation. Mostly a bad idea nowadays, but still popular & supported by quite a few frameworks.

This is popular and supported because it serves a real need. A GET request shoves all the query parameters into the URL, and if that gets long enough, it will be truncated somewhere and the results will be wrong or you'll just get an error.

By instead sending a POST request, where all the parameters go in the body and don't get truncated, but using the header to tell the server to treat it as a GET request, you solve those bugs almost seamlessly.

The drawback is that you don't have a bookmarkable/linkable URL - some of the semantics are lost. But that can be worked around afterword with an id or hash of a previous request's parameters that tells the server "give me the same results as if I had entered all the query parameters that were entered in this previous request".

It's not optimal, but pragmatic.


What a delight! I implemented an HTTP server from scratch (well, from RFC) in Objective-C some years ago. Many of these hit pretty close to home. Lots of plot twists in those RFCs.


If I could have one wish for HTTP, it would be to get rid of CORS. Not only is it an incomplete solution (only protects reads, not writes), it causes tons of difficult to debug errors. A half measure in security is often worse than no measure at all because the latter is more obvious and is a much squeakier wheel to fix.

The origin headed should be the only way to fix these issues which would force web frameworks and servers to check this header by default. Way simpler, authentication (authenticating the requestor site, not the end user) is done on the server not the client, errors can actually be handled since don't fail silently, and failures appear in server logs as well as the client.


@author. Every time I click anything in the page the whole page flashes, assumedly React is re-rendering for some reason. As someone who highlights text as I read it was quite an interesting experience :)


Hmm, that's very weird. I don't see it myself in the latest Firefox or Chrome. What browser & OS?


FireFox 85 on MacOS. It happens anywhere I click on the page, the body text very quickly flashes off and back on again.


Ok, thanks, I'll look into it.


Happens for me too- Firefox 86 on Kubuntu 18.04


My current favorite is chunked encoding.

Does amazon.com really make it's page more performant by sending 25 chunks less than 2k, some less than 50 bytes, while I'm trying to grab 115k for a page?

It's all so weird to me.


From my experience (not with amazon) these strange chunk sizes come from non-blocking IO. When a source gets some data and triggers select/poll/epoll/whatever, the callback (or equivalent) immediately writes it out as a chunk.

This works even better in HTTP/2 or HTTP/3 / QUIC. A Go server reading from a lot of microservices can produce pretty weird output on HTTP/2 because now not only is it in odd sizes determined by network timing, it doesn't even need to be in order.


It may make a huge difference for their servers, that can generate 2kB of data and send you right away, instead of generating the entire 115kB before they can send it.

Those 2kB is a bit too small for top network performance, so you may see a negative impact. But if they increase it to something like 10kB, it's harmless.


I can get it sometimes, but some sites are just bizarre. twitter.com sends out a few dozen 74 byte chunks. I can't find it now, but I've seen pages composed of chunks of 10 or 20 bytes big. So much overhead.


Some frameworks make it really easy to create reusable code that calculates something, pushes it into the network, and returns to the rest of the page.

You are right that it's not a great thing to do. A little bit of buffering on the sender can improve things a lot. But it's an easy thing to do, so people do it.


Usually it's because the app doesn't know the length of the entire response body up front and wants to start sending the response before buffering the whole thing. The 50 byte chunks probably aren't that useful, but that can happen as a consequence. Something like nagling can prevent those small chunks, but then there would likely be higher latency.


This reminded me of a post I wrote,a couple of years ago: https://honeyryderchuck.gitlab.io/httpx/2019/02/10/falacies-...


At least you didn't complain about the spelling of Referer, with Falacies in the title. Two L's.


Internet don't care 'bout Shakespeare.


> X-Requested-With: XMLHttpRequest - appended by various JS frameworks including jQuery, to clearly differentiate AJAX requests from resource requests (which can't include custom headers like this).

What does the author mean by this? Why can't a "resource request" include custom headers? I am assuming that a "resource request" is just a non AJAX request. Any HTTP client should be able to include whatever headers they want no matter the source.


I think they mean requests from a browser.


I don't understand. A browser could choose to include or not include the header in question.


> X-Powered-By: <framework> - used to advertise the framework or technology that the server is using (usually a bad idea).

I was always a fan of setting this to either be silly ( X-Powered-By: magical elves ) or outright lie (Tell em it's some ruby thing when it's ASPNET)


X-powered-by: emacs


This is a really fun read. Good balance of tech details and dry humor.


Cool article, interesting read...

But also...cool tool!


This is a little bit of a tangent but I sure love working with Websockets. It really feels like Websockets are what HTTP should've been. Asynchronous realtime communication.

When things happen on the site and it's shown to the customer immediately via WS, it's just a delightful experience.


imo, Server-Sent Events are the better solution for realtime updates. Sometimes you need the stateful bidirectional protocol WebSockets offer, but most of the time HTTP for RPC and SSE for streaming updates gets you where you need to go with standard HTTP, no special protocols.


There's one scenario where websockets win: When you don't want your client-to-server calls to get reordered in-flight.

Unfortunately the browser javascript "fetch API" still doesn't implement streaming request bodies. If it did, websockets would be obsolete.


In its day, I don't think HTTP would ever have escaped orbit if it had been designed as a stateful protocol, like Websockets.


Web goes round and round. ActiveX, Java applets, and Flash all supported sockets. It is nice that we can have them now without such things, but it's not like they're only a boon with no tradeoffs.


Anyone have a link for those MIME refresh animations?


You mean multipart/x-mixed-replace? https://www.oreilly.com/openbook/cgi/ch06_06.html

I once wrote a prototype video surveillance system with that, sending multiple still images, before video streaming was a thing. 1997, that was a long time ago...


Yeah it's really fun but not supported by Chrome anymore.


I am so confused by all of this and as a child I was pretty good at everything computers especially hardware. I built the earliest towers from scratch. I am now fully disabled and in absolute poverty and not such chance of getting away from it (I have witnessed poverty in other countries so I know it could be worse so please do not feel bad for me) due to a festering 5 year old wound that will not heal and no health care. Reading thread after thread I may feel confused, but it is helping me to catch on to the network and software side of lingo and hopefully skills I can use to be able to go to a grocery store one day instead of begging one of two people to not ignore me on the day they said they would take me by the food bank. Please tell me what I should look at learning first to try to have smoething useful I can do from home. I have a laptop and a TV I use as a secondary monnitor and that is about it and that is all I need to try to get ahead. I was completely robbed by Minergate for what is now almost one hundred thousand in crypto (It was still 5500 then) and have all those records. They finally responded days shy of seven months later saying I should have contacted sooner and they could have helped me. It wa cearly them that did it because every single one of the currencies I had mined via the app for so very long were all taken out within 5 hours and I did not get any notifications as this happened for some reason. There is so much to that story and the recent theft from me is from miningcompany.ltd I had only the last of my ltc invested for mining power ad it was working fine then suddenly they turned off their site and many lost tons. This is just compete BS because the UK government is not providing answers as to what people can do to recoup from a company they legitimized and issued licenses to etc. etc. The big huge thing that is why I am awaiting a good ol' infection to be worried about eough in an ER to I guess get my right leg lopped of from knee down killing the daily continuous pain and other pains caused by my tripping on this weak fat worthless body part; this i all something I could have electricity bills paid as they were before MinerGate stole from me (they are tiny as I use little energy) Maybe not be forced to fast and look t it positively some how until it reaches too many days I can not longer see benefit. I am embarrassed to be seen by all of my old friends because I cannot blame thigs on other things, but the injury happened and I lost my business and consolidated everything into the few bare necessities and hardware to mine with MinerGate because I knew no other way at the time and it worked fine until I had nothig left. I thought of many negative actions after this and just tried to mine what I could, but it was 4 days before a bill came due that had to be paid after the visit (HHA I never got to afford any of those vsits, but Mission Arlington pulled many teeth from what spawned due to my appointments no longer being able to be met (cant pay with IOUS) at the appropriate doctor.) I GUESS I AM JUST ASKING WHAT CAN I DO TO LEARN THE B?EST POSSIBLE THING AND BE ABLE TO EAT RIGHT AND MAYBE SAVE MY LEG! IT WOULD BE DEPRESSING CONSTANTLY, BUT IT ONLY HURTS DEEPLY AT TIMES UNTIL I REMEMBER THERE ARE TRAFFICKED KIDS AND OTHERS IN POVERTY IN THE THIRD WORLD! ANY ADVICE PLEASE HELP ME! Jpmacp@gmail.com or macpjp@gmil.com. I would write a book, but who wants to hear about this until I lose my leg or get sepsis from it. I will show you all live video and photosGRAPHIC AND SAD)for any advice so you know I am not some scumbag moocher so ofter found online.

JP!

I figured after a week of reading comments from so many beautiful brains who better to ask ya know? I wish I would not have gotten in a wild group to not get beat up all the time when young and stayed on the path I was on. Those beatings are nothing and I could deal with them all every ay and be better off than my current day to day.

MAYBE\ All I know is I know nothing at all for certain.

I am grat t business and cannot even get funds enough together for any business unless I wanted to just Break Bad ( I certainly could now that I have met so many people society would think are terrible, but they are not and offer to help if I ever can blah blah blah...

THANK YOU very much for reading.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: