Yeah, Parsec is great - I used it to parse some shading languages. I have though...

jrockway · on March 20, 2009

You could use a "smart" lexer that divides the incoming stream into functions (based on counting "{" and "}") and run Parsec separately on each function.

You can't really divide the stream without parsing it, due to things that change the meaning of "{" and "}", like /* and ".

miloshh · on March 20, 2009

Yeah, but in a way a lexer is a parser, just a simpler one (so you can write it from scratch, without a library like Parsec). Most lexers can deal with comments and string constants already, so you could extend it to do "{" and "}" as well...

jrockway · on March 20, 2009

You can still really only do this correctly if the input contains no errors. If you try to recover from errors, you risk generating errors that aren't actually errors (because the code is correct if you consider the erroneous part you skipped). I'm sure you've worked with compilers that generate multiple errors; sometimes you can trust all of the error messages, but most of the time, you fix the first error and the rest go away.

Parsing is hard.

a-priori · on March 20, 2009

The approach I tried taking (but couldn't get quite right, ran out of time and abandoned) was to create two new combinators, which I called "manyR" and "many1R". These behaved like Parsec's built-in "many" and "many1" combinators except that on failure they would print the error (using unsafePerformIO) and attempt recovery.

I really should try to get those working, because I think they would be useful additions to Parsec.