Hacker News new | past | comments | ask | show | jobs | submit login

I firmly disagree about the value of PEGs (and hence combinators) as a formalism that's ambiguous or hard to reason about. PEGs are as well-specified as CFGs, they just work in a different way. One of the strengths of PEGs is that they're never ambiguous. That does mean that order of alternates is important, and sometimes you do have to fiddle with that order when translating something like ABNF. But what you get in return is never having to deal with a parse forest, let alone punting on that problem by just reifying the greedy parse as the correct one.

I've written a tool that converts a declarative specification of a PEG grammar into a parser, although I haven't released it yet. It works great, I've used it on a number of real-world structured data formats.

Edit: cut out the first paragraph after checking the user name. Hi TQ.




"PEGs are never ambiguous" is equivalent to solving a problem by proclaiming the wrong solution to be correct.

In that sense, LALR parser generators (e.g. yacc) are also never ambiguous, because even if your grammar is ambiguous in the formal sense, yacc will produce a working parser with well-defined resolution of conflicts...

But the reality is,

    E = E `+` E
is intrinsically ambiguous, no matter how you want to spin it... the only difference is, that LALR generators point that out, whereas PEG generators sweep it under the rug.


I don't agree, as I've said, because PEG is a declarative formalism, and LALR is a parsing strategy. GLL implemented with graph-structured stacks can and will produce a parse forest for an ambiguous grammar, LALR will manifest one of those parses for the same grammar.

To lean on that means you have hidden information: your grammar is BNF + LALR, or BNF + GLL, not just BNF. PEGs, by contrast, are always PEGs. What you see is what you get.

A grammar in PEG format will always give you one parse, and which parse is predictable. If you want a different parse, you have to rewrite it. Indeed, as I'm sure you know, your example grammar isn't valid in the original PEG formalism, which prohibits immediate left recursion. Automatic rewrites into an intermediate rule are the leading method of allowing it.

Ambiguity is a well-defined concept in grammars, and PEGs aren't.

I've noticed that a lot of CFG enthusiasts don't like this about PEGs. They consider it inelegant, unprincipled. Some of that is aesthetic, some is unfamiliarity, and some is sunk cost: none of it actually engages with the formal expressive power of PEGs, nor their ergonomics as a practical tool for development.


Sure, I guess you're right, if you consider formal grammars as existing in their own abstract bubble removed from our reality.

I, on the other hand, am primarily interested in using grammars to parse programming languages, which are essentially a human form of communication (computers could use bitcode or lisp-style ASTs, no need for syntax).

I'm not an expert on PEGs, so this is copy-pasted from Wikipedia page on PEGs, I hope it's valid - the classical dangling else ambiguity.

     S ← 'if' C 'then' S 'else' S / 'if' C 'then' S
Now, there is an obvious ambiguity here, for a human reader (mediated, of course, by tabs). If you're using a formalism that wishes that ambiguity away, it just means that the formalism is a non-ideal one. What I like about LALR parser generators is, that they will explicitly alert you of this kind of human-level ambiguities.


It's okay that you aren't familiar with PEGs. I am familiar with PEGs, and so I know just by reading that, which way the ambiguity resolves. So for me, a human reader, there is no obvious ambiguity.

It's like saying there's an ambiguity in "not a and b". Sure, if you don't know the precedence assigned to 'not' and 'and' in your language. But you're supposed to learn those things.

Your not knowing how PEGs work is a weak argument against using them.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: