Hacker News new | past | comments | ask | show | jobs | submit login

isn't the mongrel/thin/unicorn http parser library built using ragel (i.e. an existing parse library)? Could you expand on why you think it's not possible to handle http with existing tools?



ragel is for writing state machines and automata in.

many parsers are written as automata, but that does mean it is in the same category as parsing tools such as LR, LL, GLR, PEG or CFG based approaches.


yes, but the generated state machine is able to recognize a regular language, so if the automaton can recognize http it would mean that it is a regular language, and existing parsing tools should be able to deal with it, or am I missing something?


regular expressions (ala cs) are equiv to finite state machines. regular expressions can't count or match ()'s. ragel allows you to mix in code within the state machine, so it is actually far, far more powerful than a finite state machine.

in http, handling things like 'Content-Length: %d' and then reading a subsequent length is a little harder. as is handling transfer-chunked encoding. http is quite fiendish in places :-)

these are 'data dependent' - the parse stream contains information (i.e length) on the subsequent tokens - although some regexps have back references, these aren't common place in parsing tools/formalisms like LR,LL,LALR,CFG or PEGs.

my point is simply that a lot of the parsing drama of late has revolved around the simple task of parsing a language, rather than parsing network protocols.

there is a larger class of parsing problems that are still to be tackled.


Ah, now I get it what you meant, thanks for the explanation




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: