Hacker News new | past | comments | ask | show | jobs | submit login

Yeah, Parsec is great - I used it to parse some shading languages. I have thought about the error recovery problem, and I think for some C-like languages there might be a solution, although I never tried it. You could use a "smart" lexer that divides the incoming stream into functions (based on counting "{" and "}") and run Parsec separately on each function. I can't think of anything much better...



You could use a "smart" lexer that divides the incoming stream into functions (based on counting "{" and "}") and run Parsec separately on each function.

You can't really divide the stream without parsing it, due to things that change the meaning of "{" and "}", like /* and ".


Yeah, but in a way a lexer is a parser, just a simpler one (so you can write it from scratch, without a library like Parsec). Most lexers can deal with comments and string constants already, so you could extend it to do "{" and "}" as well...


You can still really only do this correctly if the input contains no errors. If you try to recover from errors, you risk generating errors that aren't actually errors (because the code is correct if you consider the erroneous part you skipped). I'm sure you've worked with compilers that generate multiple errors; sometimes you can trust all of the error messages, but most of the time, you fix the first error and the rest go away.

Parsing is hard.


The approach I tried taking (but couldn't get quite right, ran out of time and abandoned) was to create two new combinators, which I called "manyR" and "many1R". These behaved like Parsec's built-in "many" and "many1" combinators except that on failure they would print the error (using unsafePerformIO) and attempt recovery.

I really should try to get those working, because I think they would be useful additions to Parsec.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: