Hacker News new | past | comments | ask | show | jobs | submit login

I have very mixed feelings about using parser generators instead of hand written parsers. I've contributed to GHC's parser grammar and spent a lot of time reading Rust's hand-written parser.

On one hand, a grammar is an invaluable tool. It helps keep you honest about ambiguities when you start thinking about adding language features. I think languages should always keep some reference grammar up to date.

That said, I don't think it is a good idea for compilers to use parser generators. A hand-written parser can have better error recovery and more helpful error messages. GHC's parser has some pretty bad parse error message.

Some work was merged into Happy a while ago to support better messages (enabling you to see the expected tokens along with the bad token), but using that in GHC itself is blocked by how complex GHC's tokenizer has become.




> A hand-written parser can have better error recovery and more helpful error messages.

That depends on what you're comparing to. The Menhir parser generator can generate a file with all possible "error states" in the parser in which you can specify very nice and friendly error messages for each one.

I've posted this before: http://gallium.inria.fr/~fpottier/slides/fpottier-2015-11-ou... (ignore the French title, the rest is in English). The examples for a C grammar in Menhir show much better error messages than the hand-written GCC ones.


Grammars don't keep you entirely honest about ambiguities, because parser generators resolve some ambiguities like shift in favor of reduce in Yacc. It's possible to massage a Yacc grammar into working and then not entirely understand it.

> A hand-written parser can have better error recovery and more helpful error messages.

A hand-written parser lets you put a breakpoint on the function which handles a given phrase rule, and when the execution stops there, you get a call stack which more or less corresponds to the top-to-bottom derivation, with arguments you can examine at every level.


The difference is that the bottom up parser is able to detect the ambiguity and complain about it at all. And anyone who wants to keep their sanity won't rely on the default yacc behavior and will take care to get rid of all the shift-reduce conflicts.


Since GHC was created, we now have some very good parser combinator libraries such as Parsec and uuparsing-lib. Haskell is a very good language with which to write a parser by hand. But no doubt it would be a big project to move off happy and success isn't guaranteed.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: