Hacker News new | past | comments | ask | show | jobs | submit login

I used a PEG parser for a language I designed because I was attracted to the linear time parsing achievable using a packrat parser.

However, I also found that parsing is rarely a performance bottleneck so that wasn't a big plus.

PEGs are however are easier to reason about, no ambiguities, so that was a good enough reason.




> PEGs are however easier to reason about

Yes and no.

PEG parsers are definitely easier to implement than any of the LR(k)-and-ilk parsers. So if you're writing or debugging a PEG parser, that will be easier.

However, while shift-reduce conflicts are confusing, they are there to give the strong guarantee that the grammar is unambiguous. And the parser generator will tell you this as soon as you've defined the grammar, before you've even used it. PEG grammars instead remove the guarantee, and let you deal with any confusion that arises much later.

Here are some methods of reasoning that standard parsers will give you that PEG parers will not:

1. A ::= B | C is exactly the same as A ::= C | B.

2. If A ::= B | C and you have a program containing a fragment that parses as an "A" because it matched "B", then you can replace that fragment with something that matches "C" and it the program will still parse.

Neither of these rules hold in PEGs.

Here's a practical concern that (2) helps with. Say you have a grammar for html, and a grammar for js. And you want to be able to parse html with embedded JS. So you stick the js grammar into the html grammar at the right places. If you're using a standard (e.g. LR(k)) parser, and you don't get any shift-reduce (or other) conflicts, then the combined grammar works. In contrast, if you're using a PEG grammar, it's possible that you've ordered things wrong and there are valid JS programs that will never parse because they're clobbered by html parsing rules outside of them. Or vice-versa.

Also, realistically if you're using a PEG parser you'll want one that handles left recursion, because working without left recursion turns your grammar into a mess. And left recursion in PEGs can have some weird behavior.


> PEGs are however are easier to reason about, no ambiguities, so that was good enough reason.

I frequently hear this mentioned ("PEGs don't have ambiguity"). It is literally true, but I don't think it's true in the sense that actually matters.

I've blogged about this in the past (https://blog.reverberate.org/2013/09/ll-and-lr-in-context-wh...), but I'm not the only person saying this:

> PEG is not unambiguous in any helpful sense of that word. BNF allows you to specify ambiguous grammars, and that feature is tied to its power and flexibility and often useful in itself. PEG will only deliver one of those parses. But without an easy way of knowing which parse, the underlying ambiguity is not addressed -- it is just ignored.

https://jeffreykegler.github.io/Ocean-of-Awareness-blog/indi...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: