Language-python: A Python Parser Written in Haskell

almost · on March 19, 2009

This looks fun :) I can think of all kinds of fun uses for it already...

I wonder why they choose Alex/Happy instead of Parsec though. From my, admittedly limited, viewpoint Parsec looks brilliant for parsers that can be reused and combined in interesting ways. My experience with it is limited to having done a piece of coursework in it for a compilers course (took me a few hours to do the coursework for the entire course vs. a lot longer for people using other tools).

Anyone who knows a little more about these things who can comment on the pros and cons of Parsec that I may have missed? Is it just that performance isn't as good or is there more to it?

a-priori · on March 19, 2009

I'm currently in the process of writing a C-Minus compiler in Haskell using Parsec for a compilers course. I'm working on the code generation phase now, so I'm mostly finished dealing with Parsec.

Parsec is indeed very nice, and I highly recommend it for pretty much any kind of parsing task. It certainly beats the pants off flex & bison. The biggest drawback I've found is that it has no support for error recovery. Once an error has been detected, Parsec prints it and aborts. It would be nice if you could make it recover by skipping to the next language element and accumulate the errors.

miloshh · on March 19, 2009

Yeah, Parsec is great - I used it to parse some shading languages. I have thought about the error recovery problem, and I think for some C-like languages there might be a solution, although I never tried it. You could use a "smart" lexer that divides the incoming stream into functions (based on counting "{" and "}") and run Parsec separately on each function. I can't think of anything much better...

jrockway · on March 20, 2009

You could use a "smart" lexer that divides the incoming stream into functions (based on counting "{" and "}") and run Parsec separately on each function.

You can't really divide the stream without parsing it, due to things that change the meaning of "{" and "}", like /* and ".

miloshh · on March 20, 2009

Yeah, but in a way a lexer is a parser, just a simpler one (so you can write it from scratch, without a library like Parsec). Most lexers can deal with comments and string constants already, so you could extend it to do "{" and "}" as well...

jrockway · on March 20, 2009

You can still really only do this correctly if the input contains no errors. If you try to recover from errors, you risk generating errors that aren't actually errors (because the code is correct if you consider the erroneous part you skipped). I'm sure you've worked with compilers that generate multiple errors; sometimes you can trust all of the error messages, but most of the time, you fix the first error and the rest go away.

Parsing is hard.

a-priori · on March 20, 2009

The approach I tried taking (but couldn't get quite right, ran out of time and abandoned) was to create two new combinators, which I called "manyR" and "many1R". These behaved like Parsec's built-in "many" and "many1" combinators except that on failure they would print the error (using unsafePerformIO) and attempt recovery.

I really should try to get those working, because I think they would be useful additions to Parsec.