Hacker News new | past | comments | ask | show | jobs | submit login
Writing An Interpreter In Go (2016) (interpreterbook.com)
305 points by tambourine_man on Nov 25, 2019 | hide | past | favorite | 60 comments



I had a lot of fun with this book, after belatedly buying it. I've read a couple of similar online-books in the past, but this was the first one I worked my way through completely.

As a result I later had a lot of fun writing a simple compiler for a reverse-polish calculator, to generate AMD64 assembly language https://github.com/skx/math-compiler/ , wrote a simple BASIC intepreter https://github.com/skx/gobasic , and put together a couple of DSL-projects which followed a similar approach.

All in all I'd recommend reading it, and definitely working through the code. Once you're done you can continue to experiment - adding support for regular expressions, a standard-library, etc, etc.

My version of the language is here and contains a fair number of improvements/additions. After covering the core in such detail it wasn't hard to extend further, even if that was primarily for fun, rather than for a real purpose:

https://github.com/skx/monkey/


Can anyone recommend other books similar to this one?

Having done the Nand2Tetris course and started the Ray Tracer Challenge book I find I really like these books that guide you through a pretty complex project.

It helps you learn by doing while at the same time preventing you from falling into bad practices or getting overwhelmed.


I haven't worked through it (yet!), but read parts of and only heard good things about Bob Nystrom's Crafting Interpreters [0]: http://craftinginterpreters.com/

If you like Scheme/Racket, I can also recommend Beautiful Racket [1]. That was quite a dose for my macro-loving brain.

Then I also recommend this "Let's Build A Compiler" series of blogposts [2] that roughly follows Abdulaziz Ghuloum's relatively famous (amongst fellow compiler fans) paper "An Incremental Approach to Compiler Construction" [3]. I've followed that series and paper for the past three months and built a Scheme to x86 compiler in Scheme. That was a lot of fun!

[0]: http://craftinginterpreters.com/ [1]: https://beautifulracket.com/ [2]: https://generalproblem.net/lets_build_a_compiler/01-starting... [3]: http://lambda-the-ultimate.org/node/1752


Crafting Interpreters by Bob Nystrom is amazing:

https://www.craftinginterpreters.com/

Only caveat is that the book is not done, yet, but there is already a ton of content.


My colleague is working on Making a RISC-V Operating System using Rust. It is a free online book.

http://osblog.stephenmarz.com/index.html


https://www.itu.dk/people/sestoft/plc/

Sestoft is a very clear writer, and F# is a better language than most for writing interpreters or compilers.


How is the quality of F# in that book? I've been itching to make the leap and this seems like a good way to do that, but I'd feel more confident if I knew that Sestoft's demonstration of the language set a good example.


I've never done F# professionally so I can't speak to how idiomatic it is, but it gets the job done from a learning point of view.


There's also https://en.wikibooks.org/wiki/Write_Yourself_a_Scheme_in_48_...

Learning Haskell via writing a Scheme interpreter.

As you say you've been through a couple of these books/guides before, I just wondered how you would characterise your learning experience, in terms of how much you think you've learned, whether you would have learned the same via other means and so on?


Not a book, but an interesting self-guided course and apparently a great way to pick up a new language: Make A Lisp [0],[1],[2]

Basically, work through building a lisp interpreter in any given language (C, Java,...bash, vimscript) in 10 steps. Each step is backed by tests, and these tests help to guide you through the process. Very cool.

[0]http://kanaka.github.io/lambdaconf/

[1]https://github.com/kanaka/mal

[2]https://www.youtube.com/watch?v=jVhupfthTEk


I'd recommend this tutorial https://ruslanspivak.com/lsbasi-part1/. It's Python.


Highly recommend this book. After reading it (and the compiler book by the same author), I started my own compiler for Knox, a Go-like language. It has been a fun project, but I made some bad decisions regarding the type checker and how I store the AST, so I'm currently procrastinating fixing those. You can check out the source here:

https://github.com/AZHenley/knox/


This week's Go Time podcast featured the author. It was enjoyable (as are all their podcasts). He also has a compiler book that is the sequel.

https://changelog.com/gotime/107


There recently also was an entertaining podcast episode with Thorsten Ball on SE Radio, in which he talks about compilers vs. interpreters and about his book as well: https://www.se-radio.net/2019/05/365-thorsten-ball-on-buildi...


What timing. I just started implementing an interpreter in go this past weekend. I’d been following Bob Nystrom’s https://www.craftinginterpreters.com, translating java into go as well as comparing my impl to the go package scanner, parser etc. (https://golang.org/pkg/go/) It’ll be very interesting to compare my own approaches with Thornsten Ball’s to see if my use of idioms and intutions about certain things were on the right track.


I'm about 1/4 of the way through it. I enjoy it so far. My only criticism is that I wish it had something in the way of exercises. It's a lot of copying (typing) code and it explaining what the code is doing and the concepts behind what you've written. That's a good approach, but I enjoy thinking through things on my own a little as well then being told how the author would have done it and why. Otherwise very good!


Lol, ebooks should have regional pricing like games. Buying these books means I will have to shell out 1/3rd of my total monthly income (student in a third-world country). Makes me wonder if there is a market for a paid service that can make copies of books available for a monthly rate (kind of like a paid ebook library with region locking and special format that can only be read by the app with a constant internet connection - Netflix for books if you will ). But I guess it would be hard to do because its just texts and if there was money in this, Amazon would have done it already.


Pick up a free trial at safaribooksonline / oreilly. Perhaps someone or your institution would sponsor a paid 50 usd per month account.


The author is very friendly and responsive. I'm sure he'll respond to your concerns if you reach out to him.


Ah, my comment was not particularly about this book. It was just a general assessment of how pricing may be a barrier for useful and practical books to a larger set of audience. And a little bit of thinking out loud if there was a viable solution to it.


I find it interesting that the Monkey language, created solely for the examples implemented in this book, looks strikingly similar to a simplified EcmaScript (ES2015+ actually). Is ES/JS the best example of a parseable language? Why not just shoot for a (subset) ES interpreter instead of a made up language that does not resonate as much with the reader?


Using a fantasy language lets you avoid anything that makes the implementation complicated or tedious.


I worked through this soon after it came out and I enjoyed it a great deal


Have you read other books with similar goals? How does this compare? I've been meaning to go through and write an interpreter / compiler but havent found time / right resources, but this seems interesting cause I've grown to love Go more lately, and how you get so much out of the box. I might just try it, but I'm curious what someone else's experience might be who was in my shoes when they read it.


I studied compiler design at university and the subject matter here was a lot more approachable, but in that, it sacrifices a lot of the theory.


I agree that it doesn't dive deep into a lot. It also makes notes of things you don't need to know while you're working through the book. That said, if you just want a basic grasp of what is happening to your code when you run it, this is a good place for that. I don't plan on writing a compiler outside of this book, and I don't think this book directly serves doing that. What it does serve is my curiosity in a compact way.


I'm likely intending to do further research for the unknowns I encounter in the book. I've looked at some books for language design that get heavy into math that I'm no longer familiar with and it's discouraging, so this might be the sweet spot for me, thanks for the response.


As someone who just recently started learning Rust, I'd love to see a book like this that uses it as the implementation language instead.


I'm curious about this chapter:

"Why not a parser generator"

Seemse like the way to go these days? Why write it yourselve if you can generate it from EBNF notation?


Author here. I'm going to be super shameless here and quote myself in that chapter:

> [...] we are here to learn, we want to understand how parsers work. And it’s my opinion that the best way to do that is by getting our hands dirty and writing a parser ourselves. Also, I think it’s immense fun.

In other words: use a parser generator when you need a working parser, quick. But if you want to learn how parsers work, what ASTs are, what "top-down parsing" means, etc. then I recommend you write your own. It's not that hard once the concept "clicked" and, again, it's a ton of fun :)


> use a parser generator when you need a working parser, quick.

Not necessarily. Use it also when you need a more correct one :-)

The way the book presents the parsing subject leaves the impression that a handcrafted parser is generally preferrable.

I have experience with a project that was based on that book, and there were (and probably, are) a lot of bugs in the parsing code, that wouldn't have been there if a parser generator was used.

I don't camp for using parser generator at all costs, however, presenting both sides of the coin would allow readers to make a (more) informed choice.


> I have experience with a project that was based on that book, and there were (and probably, are) a lot of bugs in the parsing code, that wouldn't have been there if a parser generator was used.

I think I need to a add a big "do. not. use. in. production!" disclaimer then :) The parser we build in the book is, of course, not a battle-tested, industrial-grade parser that can survive every fuzzer you throw at it. Far from it, but that's fine, since that was never the goal. The goal was always to learn.

So, yes, I agree. If you do need a production-ready, stable parser: use a mature parser generator or, maybe, invest more time in getting good at writing parsers than it takes to read and work through the ~100 pages I wrote on the subject.


Since you have written a book in the subject you obviously are much more of an expert than I am. After reading your book what materials would you recommend for someone wanting to reach the stage is bring able to implement their on "production grade" parsers?


Yeah, I did the Nand2Tetris course which culminates in writing a toy compiler and the philosophy there is the same as yours.

I really like that philosophy - as Feynman said: "What I cannot build I do not understand"


> "What I cannot build I do not understand"

This is true, but it provides no guidance on the question whether you need to understand it. If you don't want to build a web browser, you don't need to understand how web browsers are built. That doesn't mean that it is dishonorable for you to use a web browser.

So do you need to understand how generated parsers work? I think that's a matter of preference that Feynman doesn't help you decide.


If you’re designing a language, it’s not a stretch to assume that understanding how parsers work would be useful.


I think "you plug a grammar into a generator, and you get back a function that maps character sequences to a syntax tree" can be enough understanding for a lot of use cases. Maybe you're more interested in any of the many other aspects that go into designing and implementing an interpreted language -- type checking, garbage collection, an efficient bytecode format and interpreter for example. But maybe you aren't, in which case I agree that case a hand-written parser might be right up your alley! There's no one size fits all.


You need a little bit more than that, cause if that is the limit of your understanding, you will have a bad time when the generator spits back an error message about reduce/reduce conflicts or whatever instead of a function.


He discussed that on the Corecursive podcast episode [0].

> Why not a parser generator

IIRC it was about "minimising magic" - showing what's actually involved in native code. Bob Nystrom makes the same decision in "Crafting Interpreters [1]. Quoting [2]:

"Many other language books and language implementations use tools like Lex and Yacc, “compiler-compilers” to automatically generate some of the source files for an implementation from some higher level description. There are pros and cons to tools like those, and strong opinions—some might say religious convictions—on both sides.

We will abstain from using them here. I want to ensure there are no dark corners where magic and confusion can hide, so we’ll write everything by hand. As you’ll see, it’s not as bad as it sounds and it means you really will understand each line of code and how both interpreters work."

[0] https://corecursive.com/037-thorsten-ball-compilers/

[1] https://craftinginterpreters.com/

[2] https://craftinginterpreters.com/introduction.html#the-code


I think the conventional wisdom is that parser generators make it hard(er) to produce useful error messages for users. There are some exceptions, for example: http://pauillac.inria.fr/~fpottier/slides/fpottier-2015-11-o... (don't be put off by the French title, the rest is in English).


An interesting argument against that is:

> There’s also some argument made by GCC and Clang teams that they can hand-tune a hand written- parser to get better error messages. I believe they can do such hand tuning and probably have. But that argument does not stop you from customizing the grammar for generated parser to do essentially the same thing, so I don’t buy it as a differentiator.

This is an excerpt from an interesting Quora reply on the subject: https://www.quora.com/Are-there-many-established-programming...


Yes, given a sufficiently powerful parser generator (more powerful than Yacc), you can do that. That's also the point of the talk I linked.

Ira Baxter makes a living out of selling proprietary technology based on GLR parsing. His opinion matters, but if there is no open source GLR parser generator available (is there? I don't know), the point he makes cannot apply to GCC or Clang.


You don't necessarily need something more powerful than Yacc. I think the point was that you can encode errors directly into your grammar. So for some language with semicolon statement separators you can do something like this:

    PROGRAM -> STATEMENT*
    STATEMENT -> IF-STATEMENT | ASSIGN-STATEMENT | STATEMENT-ERROR
    IF-STATEMENT -> ...
    ASSIGN-STATEMENT -> ...
    STATEMENT-ERROR -> TOKEN* SEMICOLON
The idea being that if the if-statement and assignment statement rules fail you consume tokens until the next statement separator (semicolon in our case), produce an Error node in the AST and resume parsing the next statement. This allows your parser to actually catch multiple errors in one pass since it won't die the moment it encounters something it can't parse. After parsing you scan the AST and if it has any Error nodes you spit out the relevant error messages and exit. You can get as granular with the error rules as you want, although the more you add the more unwieldy the grammar becomes.

Naturally doing this in practice is harder than it sounds. Tweaking the Error rules is hard. If you consume too much or too little input you will resume in a spot that can have no chance of ever succeeding, giving way to a cascade of parse errors. I think we've probably all seen a compiler spit out a couple of dozen error messages all stemming from a single root error like a missing closing parentheses or brace.


I believe GNU Bison parser generator is capable of generating GLR parsers:

https://www.gnu.org/software/bison/manual/html_node/GLR-Pars...


Oh wow, that's pretty cool, wish I had known about it before. Unfortunately flex and bison, when used together in C++, cause a lot of headache (unless you create an "adapter" class that wraps your Flex parser and pass that to Bison). :/


One of the C# developers commented [1] on HN that they use a hand written parser for three reasons: incremental re-parsing, better error reporting, and resilient parsing (i.e., you still get a syntax tree even if something doesn't parse).

[1] https://news.ycombinator.com/item?id=13915150


In my experience the hardest part of using parser generators is to have them create understandable errors.

Also it's questionable that they add very much. Parsing text isn't all that difficult. The hard part is to get the tree structure of the AST right.

All in all the parsing is a quite small part of a compiler. I've converted from a parser generated to manual code in two different projects and they actually ended up with less lines of code after the conversion, if just barely.


> Seems like the way to go these days?

No I think people use parser generators less these days.

> Why write it yourselves if you can generate it from EBNF notation?

Can you write EBNF for it? Can you generate a parser from the EBNF? EBNF and most parser generators are designed for context-free languages. Most languages are context-sensitive. You can see where things start to go wrong...


If you are super familiar with EBNF notation, it may be the fastest way. Otherwise I like parser combinators for their simplicity, you just need to understand the host language.


The C# and Go parsers do not use a parser generator, iirc.


I believe you are correct about Go. The rationale I recall was that generated parsers were slower and afforded less control over error reporting. Notably wrt performance, fast compilation was a major goal for Go from the beginning.


I really enjoyed this book. In fact, after reading through it I wrote some code to tokenize TypeScript with TypeScript. It was a fun little project even thought I didn't really go all the way with it.


funny, i was listening to the go time podcast during my commute this morning and they had the author on...

the author seems to suggest that one should start with an interpreter, then move to a compiler. (read the interpreter book, then read the compiler book...)

i think it really depends on what you are doing. in fact, python is a good example. it compiles to bytecode that is interpreted. it has the both compiling thing going on and then the whole VM thing. i believe java is similar. what i am trying to say is that the line is very blurry nowadays.


I'm not an expert, just a hobby language designer, but it seems to me that the advantage of starting with an interpreter first is that this allows you to more easily design a language in which compile-time expressions have some reasonable expressivity. Generally, an interpreter allows the designer to experiment more with the semantics. Typical examples are evaluation of expressions for constant definitions at compile time or having built-in access and language support (e.g. syntactic sugar) to 3rd party libraries like arbitrary precision arithmetic.

The disadvantage is that it can be a lot of work. From my experience, writing a toy VM in Ada didn't seem much easier than writing a compiler in a toolkit like LLVM. (But I've never done the latter.)


Compiling to a bytecode like Python or javac is very different from compiling to machine code like GCC or the JVM.

I think a progression like AST interpreter -> bytecode compiler and interpreter -> machine-code compiler is a reasonable way to go. Starting with AST -> machine code is possible of course (that's what university compiler courses often do), but if you are explicitly interested in both compilers and interpreters, doing interpreters first makes more sense to me. It's a steady progression from higher to lower level.



[flagged]


Most languages are. IMHO, the best language for writing a compiler or interpreter is Prolog.


The ML family and Haskell are good for writing compilers because sum types and pattern matching lend themselves well to ASTs and symbolic manipulation. OCaml and Haskell have been used for real-world language projects (OCaml is used to implement OCaml, Coq, Haxe, Hack, and was used to implement the original version of Rust, and Haskell is used to implement GHC, Elm, PureScript, Agda, and Idris.). I'm making a programming language in OCaml, and I quite like the experience. I've never used Prolog, though.


Haskell was also used for Clash, Futhark, Egison, and the Curry frontend.

I think there are others written in OCaml that I may be forgetting.


I've used Haskell before, my understanding is that anything outside of the ML family is needlessly shooting yourself in the foot.

Why you'd try to write a compiler in a language that deliberately cripples sum types (Go) is beyond me.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: