Hacker News new | past | comments | ask | show | jobs | submit login
HTTP/2 Continuation Flood: Technical Details (nowotarski.info)
296 points by campuscodi 89 days ago | hide | past | favorite | 32 comments



I'd just mitigated this exact thing in Bandit last month!

https://github.com/mtrudel/bandit/blob/main/lib/bandit/http2...

TBH, from an implementors perspective this is a super obvious thing to cover off. It had long been on my radar and was something that I'd always figured other implementations had defended against as well.


Well you know what happens when we assume. You make a front page headline out of you and me.


As someone who worked for a terrible startup that 'assumed' they would have scalability issues, engineered their entire software stack around solving said issues, and ended up with a worthless codebase that nobody could wrap their head around as a result, I feel this comment.

Later they began a small refactor which easily handled the loads they were "assuming" could not be handled in the way that the refactor handled, and it was wildly successful and the code was much simpler to work on.

To developers: don't over engineer. Most languages/frameworks/libraries can handle scale beyond what you'll ever get in your initial implementation. No, you entire website does NOT need to be asynchronous. It is very possible to have too many background jobs. I know this because I've seen the horror. I've also written an entire jobless/synchronous platform that serves millions of users without issue. If you run into scaling issues, that is a good problem to have. Tackle it as it happens.

Bottom line is focus on secure, quality code above all else. Don't make assumptions.


The default way we write applications is actually pretty scalable already.

It always hurts to build something that “won’t scale” because it was framed as a negative.

Realizing that something “scales” if it meets your current needs is pretty important.

Framing scale in terms of how many people can work on it, how fast they can work on it, and how well it meets needs is often a better way of considering the “scale” of what your building.

As you said, when request per second becomes a limiting factor you can adjust your scales but doing it from start rarely makes sense (largely because req / sec already scales pretty well)


It’s often a fear or trauma response. Nobody wants to spend 6 months out of the year trying to keep the plates spinning, and they definitely don’t want to spend 60 hours a week when things get ahead of them. Everything takes twice as long as we think it will and we don’t trust that we can keep ahead of user demand. Many of us have experienced it or been adjacent, and for a long time after we overreact to that scenario.

Because we don’t trust that we can keep the wheels on.

Over time the memory fades, and the confidence improves, and we get more comfortable with things being okay instead of unassailable. But it can be a rough road until then.


Yeah, and out of that fear, people often use stacks that require vast amounts of knowledge to actually keep things working at all, at any scale. Kubernetes is the best example where I don't trust me to keep the wheels on because it's scalable.


In the last couple of months I checked dozens of implementations and, somehow, these protections were not implemented (or implemented incorrectly) even in major HTTP/2 servers

I'll speak to the elephant in the room: this is what happens when you have an entire developer culture so used to automatically dynamically expanding everything and not caring how big it is, that they never think about how big something can be.

This class of problems isn't necessarily restricted to HTTP/2, although its gross complexity probably contributes; it's just that in HTTP/1.x times, more developers would be used to languages like C where managing buffer lengths takes constant attention, and no one would bother to make header allocations expand limitlessly when they should be a few K in total at most for the whole request.


The issue is that people constantly focus on and optimize for the happy path, but don't stop and think about what would happen if an adversary deliberately and repeatedly triggered the worst case scenario. So many denial-of-service attacks (slowloris, query parameter hash collisions, etc.) come into fruition because bounded resource usage is an afterthought.


> NOT affected: Nginx, Jetty, HAProxy, NetScaler, Varnish. [0]

0: https://nowotarski.info/http2-continuation-flood/


In other words the ones that have long been contesting the use of CONTINUATION due to the risk of DoS 10 years ago. Just read any of the long threads there to get an idea, it's always about how to avoid the nasty CONTINUATION: https://lists.w3.org/Archives/Public/ietf-http-wg/2014JulSep...

If at least it had been accepted to forbid it after a non-full HEADERS frame it would have been more robust but it was perceived that the encoding job itself could have been harder (byte boundaries in compressors etc).

BTW I find it funny how we "rediscover" the same stuff every 10 years. Recently it was the well-known RESET_STREAM flood, now the CONTINUATION, soon it will probably be DATA frames of length zero, then single-byte WINDOW_UPDATES, then INITIAL_WINDOW SETTINGS that cost a lot of CPU, etc. The world is just circling in this security circus, provided it's possible to assign a name and possibly a logo to a known problem...


What about Caddy? It's a great project that deserves it's own line ;)


On Ubuntu 22.04 LTS caddy from the Ubuntu apt repo is shown as on version 2.7.6 and built with Go 1.21.5. That version of Go does not have a fix for this issue. Caddy 2.7.6 is also the latest version released on GitHub.

So no fix yet, but I think all that's needed is a recompile with the latest version of Go 1.22.2


I think that recompiling with upgraded Go will not solve the issue. It seems Caddy imports `golang.org/x/net/http2` and pins it to v0.22.0 which is vulnerable: https://github.com/caddyserver/caddy/issues/6219#issuecommen....


Looks like it's been fixed if you recompile from master as of a few minutes ago


Previous article with impacted web servers/reverse proxies from the same author.

https://nowotarski.info/http2-continuation-flood/


This has been at the top all day.

I wonder: For low-traffic websites, is it possible that running HTTP/1.1 is just safer?


HTTP/1.1 is far easier to implement, thus it is reasonable to assume it should contain fewer bugs.

HTTP/2 (and HTTP/3) is vastly different in features (added multiplexing, windowing, HPACK etc). All this transforms a largely stateless connection (in the HTTP/1.1 case) to a stateful one. And in order to maintain the stateful connection, you need to store some data (state, configuration etc), thus all these problems.

Also, in HTTP/2, since multiplexing is added, the protection characteristics are different. For example, if the connections were generated by CDN source draws, you may just allow fewer number of connections each with a large pool of multiplex channels, but if the connections were from direct user access, you may then want to allow large number of connections, but each with fewer number of multiplex channels. In HTTP/1, protection is much simpler, since everybody looked almost the same.


Correction: statement "reasonable to assume it should contain fewer bugs" is not true. The correct idea is "it should be is easier to implement HTTP/1.1 server safely because HTTP/1.1 is a simpler protocol compare to HTTP/2".


Fortunately, continuation frames don't exist in HTTP/3.


QUIC protocol itself is certainly sophisticated, but implementing it might be a different story. I'm still skeptical about the overall "safety"/protection of HTTP/3.

A QUIC UDP server is definitely going to need to store state data to maintain a connection/session, and now you also have the good old UDP security (packet flood protection etc) mixed in. I guess time will tell.


Upgrading merely to upgrade is not good engineering practice. If you expect to receive no additional benefits from the upgrade then it is probably not justified.


In ideal world yes. But most of the open source packages fixes security issue only for the last 1-2 major versions. Which means upgrading just to upgrade is a good practice as you don't have to worry about package changes at the moment of upgrading to fix security issue.


Unfortunately, there's also a lot of parroting that upgrading is a "best" practice.


COBOL was great until it wasn't.


Would not say that COBOL is still great (never wrote it) but the main problems in COBOL horror stories usually mismanagement and underinvestment, not the technology.


Not necessarily. Everyone saying HTTP/1.1 is simple has never implemented a full real-world-compatible parser for it.

HTTP/1 has lots of unobvious edge cases, and legacy quirks. Text format has way more flexibility than it seems from valid headers. It has obscure features like multi-line headers and old MIME features, 100-continue race conditions, custom hop-by-hop headers, GET bodies.

Fortunately new HTTP RFCs document many pitfalls. If you just implement what RFC 2616 said, you won't have a safe implementation.

Actual size of a request or response can be specified in multiple ways (at the same time, with conflicting values), and depends on a combination of several features, and values of headers with weird parsing rules needed for backwards compat, so "simple" HTTP implementations can be tricked into request smuggling.

Either way you need a robust, well-tested mature implementation.


I was wondering about that. It is more mature an less complex so it seems probable it is safer.


Probably. HTTP/2 is good for streaming, and even that is being replaced by newer protocols.

For normal asset serving the only advantage is more assets can be loaded in parallel since HTTP/1 is limited on connections per domain. CDNs on different domains usually prevent this from being an issue.

In theory you could serve unbundled JS assets via HTTP/2, but I have never seen it in production. Likely because you still need a compilation step most of the time.


Nice writeup and great find! Kudos to the author for taking such a broad approach and responsibly reporting their findings and finally for sharing the details in such a readable way.


Now do this slowly, and you can call it slowloris v2 :(


I just love this typo:

> After serveral retries


HTTP/2 or How to Cram a Transport Layer "Upgrade" Into an Application Layer Protocol.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: