isn't the mongrel/thin/unicorn http parser library built using ragel (i.e. an ex...

_tef · on March 15, 2011

ragel is for writing state machines and automata in.

many parsers are written as automata, but that does mean it is in the same category as parsing tools such as LR, LL, GLR, PEG or CFG based approaches.

riffraff · on March 15, 2011

yes, but the generated state machine is able to recognize a regular language, so if the automaton can recognize http it would mean that it is a regular language, and existing parsing tools should be able to deal with it, or am I missing something?

_tef · on March 15, 2011

regular expressions (ala cs) are equiv to finite state machines. regular expressions can't count or match ()'s. ragel allows you to mix in code within the state machine, so it is actually far, far more powerful than a finite state machine.

in http, handling things like 'Content-Length: %d' and then reading a subsequent length is a little harder. as is handling transfer-chunked encoding. http is quite fiendish in places :-)

these are 'data dependent' - the parse stream contains information (i.e length) on the subsequent tokens - although some regexps have back references, these aren't common place in parsing tools/formalisms like LR,LL,LALR,CFG or PEGs.

my point is simply that a lot of the parsing drama of late has revolved around the simple task of parsing a language, rather than parsing network protocols.

there is a larger class of parsing problems that are still to be tackled.

riffraff · on March 15, 2011

Ah, now I get it what you meant, thanks for the explanation