Show HN: Writing my first interpreter in C for fun

naveen99 · on June 22, 2016

Sometimes i wish for compiler tutorials that progress feature by feature, starting with simple features, rather than starting with complicated things like parsing expressions with multiple op codes (op codes being implemented as function calls, so it gets even more complicated)...

For example, a sequence like this:

1. variable assignment

2. commands of one parameter

3. if / branch statements

4. much later: nested expressions, function calls, multiple stacks, environments

basically, starting with an abstraction of 3 parameter opcodes, and then slowly building features up the abstraction layers, as well as slowly building down the optimization layers, step by step.

segmondy · on June 22, 2016

You are looking for this, "let's build a compiler" http://www.stack.nl/~marcov/compiler.pdf

naveen99 · on June 23, 2016

not exactly. more like an expanded version of chapter 5 of sicp https://mitpress.mit.edu/sicp/full-text/book/book-Z-H-31.htm...

I will put it on github if i ever get around to writing one myself.

I am thinking something targeting / including a toy version of qemu.

Flow · on June 22, 2016

Something like this?

https://youtube.com/watch?v=CqhL-BDT8lg

Dangeranger · on June 21, 2016

Hey this is great work! What resources would you recommend for someone else who wants to build their own toy language for fun and practice?

GregBuchholz · on June 21, 2016

It is hard to beat using Prolog for whipping out interpreters. Like Lisp, you get to use symbols in terms, plus you also get to use Prolog's ability to use infix operators to create terms:

    eval(X,X) :- number(X).
    eval(X+Y,Z) :- eval(X,X1), eval(Y,Y1), Z is X1+Y1.
    eval(X*Y,Z) :- eval(X,X1), eval(Y,Y1), Z is X1*Y1.
    
    main :- write("Enter an arithmetic expression followed by a dot \".\" and a newline\n"),
            prompt(_Old,>),
            repl.

    repl :- read(Input),
            eval(Input,Answer),
            print(Answer), nl,
            repl.

You might like:

https://www.metalevel.at/lisprolog/

http://www.swi-prolog.org/

https://www.amazon.com/Art-Prolog-Advanced-Programming-Techn...

agumonkey · on June 21, 2016

Evaluator; Parsers too. Prolog is an amazing idea (even with its limitation). But when you come from the bit array banging side of computer programming it's such an alien idea.

GregBuchholz · on June 22, 2016

For those interested in parsing with Prolog, check out Definite Clause Grammars and The Craft of Prolog.

http://www.learnprolognow.org/lpnpage.php?pagetype=html&page...

https://www.amazon.com/Craft-Prolog-Logic-Programming/dp/026...

segmondy · on June 22, 2016

Just a logically way of thinking. Many people don't know it, but Java is verified with prolog.

https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-4.ht...

agumonkey · on June 22, 2016

Haaaa I remember reading the spec, before I knew prolog .. didn't realize what I was reading. Neat.

Immortalin · on June 23, 2016

You can always try Erlang's pattern matching..

DanWaterworth · on June 21, 2016

I've also been creating a programming language recently [1]. I'd recommend looking into the pypy tool chain. It makes it pretty easy to hack together a decent interpreter (that JITs). That and rply for the front-end gives you enough tooling to be dangerous. Here's a good tutorial [2].

[1] https://github.com/DanielWaterworth/plastic

[2] https://morepypy.blogspot.co.uk/2011/04/tutorial-writing-int...

jhedwards · on June 21, 2016

Thank you! I've seen a lot of great articles and tutorials on HN about this subject, so you can probably find a lot of good resources doing a search here. To start I can link you to an article I wrote that teaches the basics of tokenizing and parsing: https://angularclass.com/making-a-mini-lisp-introduction-to-...

fn1 · on June 21, 2016

The classic way would be to start out with LEX/FLEX and YACC.

zzzcpan · on June 21, 2016

Forget about lex, flex, bison, yacc and all that. It's way easier and more important to start with a hand written recursive descent parser.

textmode · on June 22, 2016

Q: Why do those programs exist? Lex and yacc originally come from Bell Labs. Why wouldn't they use what you recommend instead?

And when I look at the credits for lex and yacc today I see some very capable programmers. Why do these programs exist? I may be biased somewhat as I use them regularly for small tasks, but my question is an honest one.

Also, someone else mentioned swi-prolog. I think gprolog is just as easy and is noticably faster. Curious what HN readers think are the advantages to swi-prolog.

pjmlp · on June 22, 2016

Because many don't know any better and are stuck in 70's parser tools mindset.

Tools like ANTLR[0], SableCC[1], JavaCC[2], MPS[5] and approaches like Attribute Grammars[3], Parsing expression grammars[4] are much more suitable for the modern days.

I don't know gprolog, but SWI has GUI tooling, compilation and I was already using it in 1997, so it is very sound toolchain.

[0] - http://www.antlr.org/

[1] - https://github.com/SableCC/sablecc

[2] - https://javacc.java.net/doc/JJTree.html

[3] - https://wiki.haskell.org/Attribute_grammar

[4] - https://en.wikipedia.org/wiki/Parsing_expression_grammar

[5] - https://www.jetbrains.com/mps

91891879181 · on June 22, 2016

I've tried Antlr and it was a horrible experience compared to Menhir: Being stuck with LL(n) causes a lot of unnecessary refactoring, the syntax for anything out of the ordinary is arcane and IMO not well documented.

Give me the 70's tools any day.

pjmlp · on June 22, 2016

You should of course use a tool that supports the grammar productions of the language being designed.

I just listed a few examples.

groovy2shoes · on June 22, 2016

Menhir is hardly a 70s tool...

99189198191 · on June 22, 2016

This is true of course. But it is ocamlyacc compatible and uses a separate lexer.

I'd at least say that it is using the lex/yacc approach, which I actually like.

sklogic · on June 22, 2016

LL(n) can be limiting indeed. But you can easily mix, say, PEG and Pratt. Because where else do you need LR-like parsing but in the binary expressions, the C-style dot syntax and so on.

tptacek · on June 22, 2016

I think this is kind of a big question, isn't it? You're in effect asking why people would write LL parsers versus LR.

The short answer might be that LL is simpler, and that handwritten recursive descent might be faster (especially if lexing and later stages are directly integrated into the parser) than tooled LALR parsers. LR is more expressive and can directly handle more grammars?

groovy2shoes · on June 22, 2016

I think that, in the general case, the idea that recursive-descent parsers can only parse LL(k) languages is not quite true. For some theoretical conception of "parser", perhaps, but most real, hand-rolled recursive-descent parsers rarely recognize precisely an LL(k) language, because programmers tend to mix-and-match algorithms (mixing in a little shunting yard or precedence-climbing to handle infix operators, for example) and they tend to connect the parser with other modules in less restrictive ways (add in some two-way communication with the lexer and you can parse yourself a context-sensitive language, even!).

That's not to say that I don't agree with the general argument that there's no sense in shunning parser-generators all the time. I've made good use of yacc (and ocamlyacc, and menhir, and Lua LPEG, etc.) in the past, and the parser for our commercial compiler at work is written with ANTLR to generate an LL() parser in C++. Such metaprogramming tools let us work at a higher level of abstraction, and we get all the usual benefits of working at a higher level.

I think that, in most* cases, there's really no good reason not to use a parser generator (or, alternatively, parser combinators). I don't even suspect that you learn anything in particular that you wouldn't learn by learning how to formalize your grammar and specify semantic actions for a parser generator (unless perhaps you aren't familiar with mutual recursion, that is...). However, there are some good reasons not to use a generator[1], and, as with anything, the trade-offs need to be carefully considered before deciding.

This all reminds me -- I've been meaning to play with Treetop for Ruby lately :)

--

[1]: For example, it tends to be much easier to provide good error messages with a recursive-descent parser. It's not always obvious how a particular configuration for a table-driven parser corresponds to a particular mistake or code formation. There has been some fruitful research in that area of late, though.

There's also something to be said for not having a dependency on an extra tool, especially since such tools tend to be complicated and difficult to debug or extend. As such, when you encounter a language construct that's not easily definable for the generator, you may have have any easier time resorting to some hacks to get what you want rather than just extending the tool.

sklogic · on June 22, 2016

> it tends to be much easier to provide good error messages with a recursive-descent parser

It is not any more complicated with a parser generator (if it's a PEG-based one). In fact, it's much easier, you're avoiding a lot of boilerplate this way.

> There's also something to be said for not having a dependency on an extra tool

Do not have a dependency on an extra tool. Use a meta-language, in which a parser generator can be embedded as a first class language feature.

groovy2shoes · on June 28, 2016

> It is not any more complicated with a parser generator (if it's a PEG-based one). In fact, it's much easier, you're avoiding a lot of boilerplate this way.

True. PEGs are effing great!

> Do not have a dependency on an extra tool. Use a meta-language, in which a parser generator can be embedded as a first class language feature.

This is ideal, but unfortunately not always possible. For example, if you're stuck writing a language processor in C, you don't get the necessary tools for linguistic abstraction without an external dependency. In general, though, I agree with the sentiment.

0x07c0 · on June 21, 2016

they are not very noob friendly, "shift reduce conflict found in state 111" etc :)

brettlangdon · on June 21, 2016

Given how much of a novice I still am in the subject, might want to take my suggestions with a grain of salt compared to others.

I have always really enjoyed just reading the source code for programming languages. As I learn more and more, I seem to take away a bit more each time.

Personally I've enjoyed reading through the source code for Go, since it is hand written in Go. Being hand written, it can be a little repetition reading through it, but I find it to be pretty easier to read/understand.

https://github.com/golang/go/tree/release-branch.go1.6/src/g...

Also, I have read (at least parts of) the book, "Engineering a Compiler", which being a novice in the subject, some of it goes over my head, but I think it does a better job outlining the topic than any other books I have read.

https://www.amazon.com/Engineering-Compiler-Second-Keith-Coo...

Enjoy :)

bertr4nd · on June 21, 2016

I highly recommend Peter Norvig's example Lisp interpreters. He has Python and Java implementations and they are both very accessible, so you can digest them in a few hours' time. "Production" language interpreters tend to have so much complexity that it can be hard even for an expert to find the essence of the interpreter.

bluetomcat · on June 22, 2016

I recently created a toy interpreter for a minimalistic C-like language that is easily extensible:

https://github.com/bbu/simple-interpreter

I am also currently developing a "real" language that is based on the same lexing and parsing machinery:

https://github.com/bbu/quaint-lang

Nr7 · on June 22, 2016

http://www.buildyourownlisp.com/

reachtarunhere · on June 22, 2016

This looks great. Thanks!

sleepychu · on June 22, 2016

I like Haskell + Parsec.

melling · on June 22, 2016

Here are my compiler resources:

https://github.com/melling/ComputerLanguages/blob/master/com...

weberc2 · on June 21, 2016

Very cool. Knowing little about programming language implementation, I wonder what it would take to make a compiler. For so simple a language, I would think it would be not much harder, no?

0x07c0 · on June 21, 2016

Not looked at OPs code, but if you build this stuff how a compiler book tells you to... The front end of a interpreter and a compiler is the same. You build a AST, if you interpret you well interpret that. If you have a compiler, the AST is turned into the target language, in most cases asm. The front end is the scanner/Lexer, the parser and most likely semantic analyser. So to answer you question :), if you make a non optimizing compiler, turning the AST into asm is trivial. The hard part of making compilers is the optimization. Most of the front end you do not need to write actually! There are compiler-compilers for that.

https://en.wikipedia.org/wiki/Abstract_syntax_tree

https://en.wikipedia.org/wiki/Compiler#Front_end

https://en.wikipedia.org/wiki/Lexical_analysis

https://en.wikipedia.org/wiki/Parsing

https://en.wikipedia.org/wiki/Semantic_analysis_%28compilers...

groovy2shoes · on June 22, 2016

In fact, you can write a naïve interpreter or compiler completely in lex and yacc :)

felixangell1024 · on June 21, 2016

Depends what you compile to. Writing a register allocator may be a little tricky (if you compile to a VM/Assembly/..), though if you compile to something like C, it's very easy. LLVM is another alternative which is also relatively simple to learn.

andars · on June 21, 2016

Spilling every variable to memory isn't too bad if you just want it to work and aren't worried about performance.

sklogic · on June 22, 2016

Linear scan register allocation is "good enough" in most cases and it is trivial if you're doing it on top of an SSA.

sklogic · on June 22, 2016

Compilers are in general far simpler than interpreters, especially if you're trying to build an interpreter with a reasonable performance.

For a simple language compiler is nothing more than a chain "parse -> strip off syntax sugar -> resolve lexical scope -> propagate types -> flatten the code -> emit target code", with each step being independent and trivial, nothing more than a set of tree rewrite rules.

ndesaulniers · on June 22, 2016

Now to make it self hosting. ;)

_vya7 · on June 21, 2016

Nice! If you're looking for another fun challenge, make your language compile into bytecode and execute that instead of an AST. :)

chrisseaton · on June 22, 2016

ASTs are fighting back these days! I work with a system at Oracle for going straight from AST to JIT, with no bytecode, and it's proving to be very fast and very easy.

mungoman2 · on June 22, 2016

That is super cool! Are there any public resources available about this?

chrisseaton · on June 22, 2016

The key paper: http://lafo.ssw.uni-linz.ac.at/papers/2013_Onward_OneVMToRul...

There's implementations of Ruby, JS, R, Python, etc, using it.

I have a blog about the Ruby effort http://chrisseaton.com/rubytruffle/

stevekemp · on June 22, 2016

That's a nice set of readable code. I made a minor change just to add in floats instead of ints, which wasn't too hard to accomplish thanks to your structure.

(Though my changes are ugly).

If/When you get to implementing functions things will get really useful :)

jhedwards · on June 22, 2016

I just realized reading your comment that I did implement functions but I forgot to document them! I will update the readme immediately.

If you look in the main file, huo.c, you will see something called store_defs. That function takes the ASTs of defined functions and stores them in a key-val type store. If someone invokes a user-defined function I just grab the AST, replace the variables with the values they passed in and execute it. That code is in process_defs and is invoked by execute.

stevekemp · on June 22, 2016

Wonderful news. I managed to gloss over that :)

ckaygusu · on June 21, 2016

Nice work.

You may want to take a look at Bison/Flex. While they don't suit the tastes of everyone, it's good to know how to work with them. For example, BASH parser is written using those tools.

analog31 · on June 22, 2016

Nice work. What would take, to make it into a REPL? I haven't looked closely at the code yet, so maybe it's close to being there.

jhedwards · on June 22, 2016

That's a good idea. I don't think it would be hard, you just get the interpreter to run in a loop and read from the keyboard input. I'll put that on the list for upcoming features :-)