Can you explain why 'validating it with grammar' is better than 'some adhoc pars...

EvilTrout · on Sept 2, 2010

Many of the security attacks on Apache (and other browesers) were based on invalid requests.

The original Mongrel was known for its powerful request handler that didn't let many of the same security attacks through. In fact, in the Ruby world, many other non-mongrel web servers used the mongrel handler for that very reason.

zedshaw · on Sept 2, 2010

As long as you don't get into the algorithms it's pretty simple.

A hand written http parser is kind of like writing a "white-list" of what the server rejects. Since there's no algorithm backing it the only thing you can do is list out all the things you can think of or have run into that is "wrong".

Using a parser (well lexer really) like Ragel I can make something that's relaxed, but it's more of a white-list of what it accepts. The algorithm explictily says this particular set of characters in this grammar is all that I'll answer to.

If you then write the grammar so that it handles 99% of the requests you run into in the wild, you get the same relaxed quality as a hand written one, but it explicitly drops the 1% that are invalid or usually hacks.

This is also the same parser that's power a large number of web servers in multiple languages, so it's proven to work.

points · on Sept 3, 2010

My mind is still boggling over how you make a simple HTTP request parser so complex sounding.

shoover · on Sept 3, 2010

Take a look at the Mongrel/Mongrel2 Ragel grammar and compare it to a hand-written request parser. You might be surprised which is complex.

points · on Sept 3, 2010

Yeah I just did. Mongrel2 looks overly complex. But then it is C...

gvb · on Sept 2, 2010

Yeah, that is what Microsoft did (does). The result is most of the requests "look like" a desire to serve up viruses or spam.

A grammar is theoretically provable (yes, that is a double entendre). An ad-hoc implementation is not provable and exhaustively testing its validity is unrealistic for anything but trivial grammars.

points · on Sept 2, 2010

And in laymans terms?...

Sorry but I'm even more confused now. What are we proving?

"The result is most of the requests "look like" a desire to serve up viruses or spam."

I have no idea what you mean by that.

HTTP is a trivial grammar. The parser is the simple bit. What you do with the headers and how you respond to them is the more interesting bit.

Why would rejecting invalid requests be desirable? Why not just serve up what we think they want? (Of course there's levels of 'invalid'. Reject the crazies, but allow some).

gvb · on Sept 2, 2010

With a parser that implements a grammar, you can prove that (a) it accepts every string that is valid as defined by the grammar and (b) it rejects every string that is invalid. The specifying of a grammar is relatively straight-forward (hopefully). Proving that an ad-hoc parser does (a) and (b) is nearly impossible.

Ad-hoc parsers can be shown to accept all "OK" strings that somebody used to test the parser and can be shown to reject all "not OK" strings that somebody used to test the parser.[1] "The problem with idiots (and black-hats) is that they are so ingenious." The only way to prove that an ad-hoc parser is truly correct is to run all possible strings through it, complete with a-priory knowledge of which strings are OK and which are to be rejected. This is an O(infinite) problem (i.e. the halting problem http://en.wikipedia.org/wiki/Halting_problem).

Guessing intent is a wormhole: how close does the request need to be? What if you guess wrong?

The combination of ad-hoc parsers with guessing intent is a potent way to introduce security flaws in your program. In the case of a web server, the "attack surface" is the whole internet, i.e. there is a huge number of idiots and black-hats that could potentially attack your program.

[1] War story: in a previous life, the company decided they needed to have a custom code standards checker program (a result of a chain of four or five decisions, all of them really stupid, but that is a different war story). They contracted out the creation of the program, complete with a requirement that the contractee company write the test cases (fox in the hen house). The program was a POS (how did you know that was coming???).

When I looked at the test cases: they had one "positive" (i.e. catches a "bad" construct) test case and NO "negative" (i.e. does not have false positive) test cases. As a result, when run on real code, the "standards checking" program was actively sabotaging good code!

points · on Sept 2, 2010

Here's why I dislike it. You shouldn't rely on the HTTP request parser to save you if you have security issues elsewhere.

The HTTP parser is simple enough to not have any concerns in itself if written properly.

You should fix the security issues.

bmm6o · on Sept 2, 2010

> You shouldn't rely on the HTTP request parser to save you if you have security issues elsewhere

This doesn't make sense. Why should a particular piece of the application not be coded with security in mind?

> You should fix the security issues.

One part of this is sanitizing user input. Why would you not do this as early as possible?

points · on Sept 3, 2010

Because it's inefficient unnecessary overhead for static requests.

The place to block application specific hacky looking requests isn't in the general HTTP request parser. It's in the 'application specific' stuff.

blasdel · on Sept 3, 2010

The headers and such for even the most static requests still get used all over — dispatch, caches, logging, etc. The overhead is minuscule, especially compared to a hand-rolled parser that's literate enough to be maintainable.

And the purpose isn't to "block application specific hacky looking requests", it only does that as a side-effect — this isn't some inane IDS bullshit sold to PHBs. It's not looking for exploit signatures, it just sanitizes all input as a consequence of correctness.

Kaizyn · on Sept 3, 2010

It's quite simple really. Do you like your compiler? Or would you rather write code and hope for the best? Compilers work because they have a formal grammar of what is and is not acceptable in the programming language. This same principle is being applied to handling web requests. We have a standard - HTTP - and any requests that don't conform to the protocol are immediately rejected by Mongrel2. Since many attacks against web servers involve sending improper web requests, this sort of approach simply rejects those requests and doesn't even begin to process them. This certainly doesn't prevent Mongrel2 from implementing proper security at other appropriate places in the code. It simply stops a whole lot of potential exploits before they start.

points · on Sept 3, 2010

Yeah that's not really comparable.

HTTP requests come from millions of different browsers. Some with bugs, some with idiot creators, etc etc.

My point was that an HTTP request parser is trivial to write correctly. What you do with the headers and request later on are where sometimes you need to be careful.

TBH Though I think I'm just in a different world to all of this mongrel stuff.