Hacker News new | past | comments | ask | show | jobs | submit login

Hey this is great work! What resources would you recommend for someone else who wants to build their own toy language for fun and practice?



It is hard to beat using Prolog for whipping out interpreters. Like Lisp, you get to use symbols in terms, plus you also get to use Prolog's ability to use infix operators to create terms:

    eval(X,X) :- number(X).
    eval(X+Y,Z) :- eval(X,X1), eval(Y,Y1), Z is X1+Y1.
    eval(X*Y,Z) :- eval(X,X1), eval(Y,Y1), Z is X1*Y1.
    
    main :- write("Enter an arithmetic expression followed by a dot \".\" and a newline\n"),
            prompt(_Old,>),
            repl.

    repl :- read(Input),
            eval(Input,Answer),
            print(Answer), nl,
            repl.

You might like:

https://www.metalevel.at/lisprolog/

http://www.swi-prolog.org/

https://www.amazon.com/Art-Prolog-Advanced-Programming-Techn...


Evaluator; Parsers too. Prolog is an amazing idea (even with its limitation). But when you come from the bit array banging side of computer programming it's such an alien idea.


For those interested in parsing with Prolog, check out Definite Clause Grammars and The Craft of Prolog.

http://www.learnprolognow.org/lpnpage.php?pagetype=html&page...

https://www.amazon.com/Craft-Prolog-Logic-Programming/dp/026...


Just a logically way of thinking. Many people don't know it, but Java is verified with prolog.

https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-4.ht...


Haaaa I remember reading the spec, before I knew prolog .. didn't realize what I was reading. Neat.


You can always try Erlang's pattern matching..


I've also been creating a programming language recently [1]. I'd recommend looking into the pypy tool chain. It makes it pretty easy to hack together a decent interpreter (that JITs). That and rply for the front-end gives you enough tooling to be dangerous. Here's a good tutorial [2].

[1] https://github.com/DanielWaterworth/plastic

[2] https://morepypy.blogspot.co.uk/2011/04/tutorial-writing-int...


Thank you! I've seen a lot of great articles and tutorials on HN about this subject, so you can probably find a lot of good resources doing a search here. To start I can link you to an article I wrote that teaches the basics of tokenizing and parsing: https://angularclass.com/making-a-mini-lisp-introduction-to-...


The classic way would be to start out with LEX/FLEX and YACC.


Forget about lex, flex, bison, yacc and all that. It's way easier and more important to start with a hand written recursive descent parser.


Q: Why do those programs exist? Lex and yacc originally come from Bell Labs. Why wouldn't they use what you recommend instead?

And when I look at the credits for lex and yacc today I see some very capable programmers. Why do these programs exist? I may be biased somewhat as I use them regularly for small tasks, but my question is an honest one.

Also, someone else mentioned swi-prolog. I think gprolog is just as easy and is noticably faster. Curious what HN readers think are the advantages to swi-prolog.


Because many don't know any better and are stuck in 70's parser tools mindset.

Tools like ANTLR[0], SableCC[1], JavaCC[2], MPS[5] and approaches like Attribute Grammars[3], Parsing expression grammars[4] are much more suitable for the modern days.

I don't know gprolog, but SWI has GUI tooling, compilation and I was already using it in 1997, so it is very sound toolchain.

[0] - http://www.antlr.org/

[1] - https://github.com/SableCC/sablecc

[2] - https://javacc.java.net/doc/JJTree.html

[3] - https://wiki.haskell.org/Attribute_grammar

[4] - https://en.wikipedia.org/wiki/Parsing_expression_grammar

[5] - https://www.jetbrains.com/mps


I've tried Antlr and it was a horrible experience compared to Menhir: Being stuck with LL(n) causes a lot of unnecessary refactoring, the syntax for anything out of the ordinary is arcane and IMO not well documented.

Give me the 70's tools any day.


You should of course use a tool that supports the grammar productions of the language being designed.

I just listed a few examples.


Menhir is hardly a 70s tool...


This is true of course. But it is ocamlyacc compatible and uses a separate lexer.

I'd at least say that it is using the lex/yacc approach, which I actually like.


LL(n) can be limiting indeed. But you can easily mix, say, PEG and Pratt. Because where else do you need LR-like parsing but in the binary expressions, the C-style dot syntax and so on.


I think this is kind of a big question, isn't it? You're in effect asking why people would write LL parsers versus LR.

The short answer might be that LL is simpler, and that handwritten recursive descent might be faster (especially if lexing and later stages are directly integrated into the parser) than tooled LALR parsers. LR is more expressive and can directly handle more grammars?


I think that, in the general case, the idea that recursive-descent parsers can only parse LL(k) languages is not quite true. For some theoretical conception of "parser", perhaps, but most real, hand-rolled recursive-descent parsers rarely recognize precisely an LL(k) language, because programmers tend to mix-and-match algorithms (mixing in a little shunting yard or precedence-climbing to handle infix operators, for example) and they tend to connect the parser with other modules in less restrictive ways (add in some two-way communication with the lexer and you can parse yourself a context-sensitive language, even!).

That's not to say that I don't agree with the general argument that there's no sense in shunning parser-generators all the time. I've made good use of yacc (and ocamlyacc, and menhir, and Lua LPEG, etc.) in the past, and the parser for our commercial compiler at work is written with ANTLR to generate an LL() parser in C++. Such metaprogramming tools let us work at a higher level of abstraction, and we get all the usual benefits of working at a higher level.

I think that, in most* cases, there's really no good reason not to use a parser generator (or, alternatively, parser combinators). I don't even suspect that you learn anything in particular that you wouldn't learn by learning how to formalize your grammar and specify semantic actions for a parser generator (unless perhaps you aren't familiar with mutual recursion, that is...). However, there are some good reasons not to use a generator[1], and, as with anything, the trade-offs need to be carefully considered before deciding.

This all reminds me -- I've been meaning to play with Treetop for Ruby lately :)

  --
[1]: For example, it tends to be much easier to provide good error messages with a recursive-descent parser. It's not always obvious how a particular configuration for a table-driven parser corresponds to a particular mistake or code formation. There has been some fruitful research in that area of late, though.

There's also something to be said for not having a dependency on an extra tool, especially since such tools tend to be complicated and difficult to debug or extend. As such, when you encounter a language construct that's not easily definable for the generator, you may have have any easier time resorting to some hacks to get what you want rather than just extending the tool.


> it tends to be much easier to provide good error messages with a recursive-descent parser

It is not any more complicated with a parser generator (if it's a PEG-based one). In fact, it's much easier, you're avoiding a lot of boilerplate this way.

> There's also something to be said for not having a dependency on an extra tool

Do not have a dependency on an extra tool. Use a meta-language, in which a parser generator can be embedded as a first class language feature.


> It is not any more complicated with a parser generator (if it's a PEG-based one). In fact, it's much easier, you're avoiding a lot of boilerplate this way.

True. PEGs are effing great!

> Do not have a dependency on an extra tool. Use a meta-language, in which a parser generator can be embedded as a first class language feature.

This is ideal, but unfortunately not always possible. For example, if you're stuck writing a language processor in C, you don't get the necessary tools for linguistic abstraction without an external dependency. In general, though, I agree with the sentiment.


they are not very noob friendly, "shift reduce conflict found in state 111" etc :)


Given how much of a novice I still am in the subject, might want to take my suggestions with a grain of salt compared to others.

I have always really enjoyed just reading the source code for programming languages. As I learn more and more, I seem to take away a bit more each time.

Personally I've enjoyed reading through the source code for Go, since it is hand written in Go. Being hand written, it can be a little repetition reading through it, but I find it to be pretty easier to read/understand.

https://github.com/golang/go/tree/release-branch.go1.6/src/g...

Also, I have read (at least parts of) the book, "Engineering a Compiler", which being a novice in the subject, some of it goes over my head, but I think it does a better job outlining the topic than any other books I have read.

https://www.amazon.com/Engineering-Compiler-Second-Keith-Coo...

Enjoy :)


I highly recommend Peter Norvig's example Lisp interpreters. He has Python and Java implementations and they are both very accessible, so you can digest them in a few hours' time. "Production" language interpreters tend to have so much complexity that it can be hard even for an expert to find the essence of the interpreter.


I recently created a toy interpreter for a minimalistic C-like language that is easily extensible:

https://github.com/bbu/simple-interpreter

I am also currently developing a "real" language that is based on the same lexing and parsing machinery:

https://github.com/bbu/quaint-lang



This looks great. Thanks!


I like Haskell + Parsec.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: