Hacker News new | past | comments | ask | show | jobs | submit login

It's the message you should really take away from meta-programming in LISP is that code is a message that can be read different ways.

That's limited though by practical considerations such as the Turing and Gödel results, how your algorithms scale with problem size, etc.

There is room for a revolution in parsing tools, but it seems almost every attempt falls short. (PEG in Python is the latest, maybe they will expose a general purpose interface to it someday, but by then certain performance concerns will be so baked in that features that expand use-cases for parsing such as automatic unparsing and round-tripping won't materialize)

We are still coding to 1970's style 'callback hell' interfaces from yacc when we should be able to write one grammar and get not just the parser but also a set of semantic structures, for instance if you were parsing Java in Java you'd have objects to represent expressions, operators, etc.




> PEG in Python is the latest, [...]

The popularity of PEG baffles me. I guess it ties in nicely with the wider and older pattern of rejecting good formal language and parsing theory, but PEG seems like an all around bad technology; like, is there even a single benefit to using PEG?

The difference between PEG and the usual grammars is that PEG introduces special meaning to the "choice" grammar operation so as to define away ambiguity. But defining away ambiguity actually just "hides" language constructions that will be ambiguous to users. Although PEG notation may be indistinguishable from those used to define CFGs, PEG sucks as a language specification tool as it is unclear what language a PEG defines (unlike with CFGs). From the perspective of implementation, PEG parsers backtrack, meaning either exponential run-time or additional linear memory requirements.

Also some Jeffrey Kegler quotes:

> With PEG, what you see in the extended BNF is not what you get. PEG parsing has been called "precise", apparently based on the idea that PEG parsing is in a certain sense unambiguous. In this case "precise" is taken as synonymous with "unique". That is, PEG parsing is precise in exactly the same sense that Jimmy Hoffa's body is at a precise location. There is (presumably) exactly one such place, but we are hard put to be any more specific about the matter.

and

> When you do not know the language your parser is parsing, you of course have the problem that your parser might not parse all the strings in your language. That can be dealt with by fixing the parser to accept the correct input, as you encounter problems.

> A second, more serious, problem is often forgotten. Your PEG parser might accept strings that are not in your language. At worst, this creates a security loophole. At best, it leaves with a choice: break compatiblity, or leave the problem unfixed. and

> PEG is not unambiguous in any helpful sense of that word. BNF allows you to specify ambiguous grammars, and that feature is tied to its power and flexibility and often useful in itself. PEG will only deliver one of those parses. But without an easy way of knowing which parse, the underlying ambiguity is not addressed -- it is just ignored.


> Is there even a single benefit to using PEG?

In a formal sense, there are some non-context-free languages that PEG can recognize ( A^n B^n C^n is the classic example ). It’s an open question whether there are any context-free languages PEG isn’t capable of recognizing.

Fundamentally, PEG is a formalization of the common practice of writing a recursive-descent parser: It defines languages based on an exemplar algorithm that parses them instead of the generative approach taken by BNF, which defines how you can enumerate the legal strings of the language. They’re both as mathematically rigorous as each other, but approach the problem space from opposite sides.

For BNF, ambiguity is a question of whether the generation process is reversible: An ambiguous grammar has multiple paths that produce the same string which means that there’s no way to recover the path used to generate that string.

PEG is “unambiguous” because it’s not based on a generative model at all, so this definition is nonsensical. With PEG, we know, by definition, how any given string will parse but it can be hard to enumerate the strings that will parse in a given way. If PEG has an interesting notion of ambiguity (which it probably does), that’s the place to look for it.


PEG for one thing offers the possibility of composibility.

For a language like Python or Java it shouldn't be much harder to add an "unless(X) {Y}" equivalent to "if(!X) {Y}" than it is in LISP if the compiler was structured in such a way that you could take "Java Compiler A" and specialize it to "Java Compiler B" such that one term was added to the grammar and one AST tree transformation that writes the term.

Exclusive of POM files, import statements and other packing material that should be less than 50 lines of code if we built compilers to be extensible.

The arangodb database has a kind of SELECT statement that returns a list of unordered dictionaries which is frustrating if you want to plot the result in a pandas Dataframe. I wrote something in Python using PEG that would parse an adb query and static analyze it and determine the intended field order when definite.

I am up for any compiler technology that makes it easy to do that even if it means I don't get compilation as fast as Go.

Another issue that matters in error handling. Drools is a great rules engine on paper, but as it is composed with an external Java compiler you can often get into cases where the error messages don't make sense at all and it isn't like the Jena rules engine which is simple enough that I could get quick results looking at it in the debugger. That's a complicated example but I think language users deserve better error handling and compiler authors need better tools to do that.


Lisp is certainly the most powerful/elegant meta-programming language out there.

But there is a nuance here: > code is a message that can be read different ways

Think about it. When you read code directly, then you are now dependent on code. And code can change for many reasons, including language changes and mere syntax changes.

For that reason, I like the approach of describing the required actions in a certain data format and then reading this data format in a different ways, in opposition to read code directly. Both in the same language of course. This also reliefs you from having to represent generic expressions, operators etc. you only need the subset that is relevant for you, which is most likely much more coarse grained.

This is imho the cleaner approach but it comes with the overhead of having to design a "sublanguage" in your language and than interpreting it in your language.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: