I firmly disagree about the value of PEGs (and hence combinators) as a formalism...

tomp · on Oct 22, 2020

"PEGs are never ambiguous" is equivalent to solving a problem by proclaiming the wrong solution to be correct.

In that sense, LALR parser generators (e.g. yacc) are also never ambiguous, because even if your grammar is ambiguous in the formal sense, yacc will produce a working parser with well-defined resolution of conflicts...

But the reality is,

    E = E `+` E

is intrinsically ambiguous, no matter how you want to spin it... the only difference is, that LALR generators point that out, whereas PEG generators sweep it under the rug.

samatman · on Oct 22, 2020

I don't agree, as I've said, because PEG is a declarative formalism, and LALR is a parsing strategy. GLL implemented with graph-structured stacks can and will produce a parse forest for an ambiguous grammar, LALR will manifest one of those parses for the same grammar.

To lean on that means you have hidden information: your grammar is BNF + LALR, or BNF + GLL, not just BNF. PEGs, by contrast, are always PEGs. What you see is what you get.

A grammar in PEG format will always give you one parse, and which parse is predictable. If you want a different parse, you have to rewrite it. Indeed, as I'm sure you know, your example grammar isn't valid in the original PEG formalism, which prohibits immediate left recursion. Automatic rewrites into an intermediate rule are the leading method of allowing it.

Ambiguity is a well-defined concept in grammars, and PEGs aren't.

I've noticed that a lot of CFG enthusiasts don't like this about PEGs. They consider it inelegant, unprincipled. Some of that is aesthetic, some is unfamiliarity, and some is sunk cost: none of it actually engages with the formal expressive power of PEGs, nor their ergonomics as a practical tool for development.

tomp · on Oct 22, 2020

Sure, I guess you're right, if you consider formal grammars as existing in their own abstract bubble removed from our reality.

I, on the other hand, am primarily interested in using grammars to parse programming languages, which are essentially a human form of communication (computers could use bitcode or lisp-style ASTs, no need for syntax).

I'm not an expert on PEGs, so this is copy-pasted from Wikipedia page on PEGs, I hope it's valid - the classical dangling else ambiguity.

     S ← 'if' C 'then' S 'else' S / 'if' C 'then' S

Now, there is an obvious ambiguity here, for a human reader (mediated, of course, by tabs). If you're using a formalism that wishes that ambiguity away, it just means that the formalism is a non-ideal one. What I like about LALR parser generators is, that they will explicitly alert you of this kind of human-level ambiguities.

samatman · on Oct 23, 2020

It's okay that you aren't familiar with PEGs. I am familiar with PEGs, and so I know just by reading that, which way the ambiguity resolves. So for me, a human reader, there is no obvious ambiguity.

It's like saying there's an ambiguity in "not a and b". Sure, if you don't know the precedence assigned to 'not' and 'and' in your language. But you're supposed to learn those things.

Your not knowing how PEGs work is a weak argument against using them.