Hacker News new | past | comments | ask | show | jobs | submit login

More interesting is why Content-Length abcd: is treated same as Content-Length: at all? Someone overoptimized the header lookup? Then perhaps other kinds of extensions like Content-Length-abcd are possible, not only with space?



More likely, they're stopping on the first space OR colon to parse the header name since "Content-Length : 0" is valid.

Personally, if I were writing a HTTP request parser while being lazy about enforcing spec, I'd split ONLY on the colon, then just strip the white space on either side of both the header name and value. In Python:

    header, value = line.split(':', maxsplit=1)
    header = header.strip().lower()
    value = value.strip()
After that, `header` should ALWAYS be checked via equality, and never `.startswith(...)`.


Note that parsing is likely more complicated than your code because you have assumed that your “line” has already been identified before parsing the line. AFAIK there is an escape sequence for the header delineator (\r\n).

Also, your code doesn’t fix the issue where a header name with a white space is accepted (which may violate expectations, depending on the server).

Your pseudo code also doesn’t handle edge cases where 2 headers which normalize to the same stripped text collide. One HTTP smuggling vector is the front server keeping a different header value than the back server when 2 header names collide.


> since "Content-Length : 0" is valid.

According to which spec? RFC 7230 allows optional whitespace (OWS) after the colon, but not before it:

   header-field   = field-name ":" OWS field-value OWS


You definitely shouldn't strip left side of header, as space preceding that is syntax for header splitting over multiple lines at least in email. Not sure if this applies to http though, but some parsers may do that anyway, and some don't.

Just shows how easy it is to be wrong by being lazy with http parsing.


I'm guessing they just check what each line starts with. Then they probably split the line on the : to get the value. That would produce the results seen.

It shows just how careful you have to be when writing code that is Internet-facing, and especially on the scale of AWS where you have half the world's hackers trying to find exploits.

I'm not even looking for exploits and I find them every day. For instance, I wanted to read some magazines the other day but they were behind a paywall. Just to see what was behind the wall I checked for a sitemap file. 35MB sitemap.xml contains direct links to the full downloads of every item with no auth needed.


> It shows just how careful you have to be when writing code that is Internet-facing

All code. “Internet facing” is not the only relevant qualification.

Any code where user-generated code is parsed should be carefully written, tested, and documented. Edge cases should be identified and described in specs. Non-compliant software should be identified and shamed (or preferably PRed).

I know that AWS has already patched some HTTP Smuggling attacks maybe 3 years ago, but I don’t remember if is was the same AWS feature (the previous one might have been CloudFront) and the parsing error might have been a little different.


Probably backwards compatibility with some ancient webserver from the dawn of HTTP which became frozen into the protocol forever and everyone who proposes fixes that runs into some grey hair who is worried about xkcd-spacebar'ing someone out there, even though its probably no longer relevant, but standards bodies being what they are it is difficult to accept any risk.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: