I Built a Lisp Compiler

dap232 · on March 28, 2019

Me too! ;-)

Back in 2006 –when I was studying CS at Rollins College– I wrote a Scheme (subset of Lisp) interpreter that also shows a visual representation of linked lists and function calls. You can check it out here: http://davidpilo.com/pvts/

The interpreter was written in Java and while it does not support the full R5RS grammar it supports quite a bit of it (see http://davidpilo.com/pvts/language.html)

Cheers!

msla · on March 28, 2019

> Scheme (subset of Lisp)

There are people who'd fight you over that one. :)

From Wiki: http://wiki.c2.com/?IsSchemeLisp

> One of the interesting debates in Lisp circles is the question of whether or not SchemeLanguage should be considered part of the LispLanguage family. There are more than a few Lispers (and perhaps a few schemers) who think that the LispSchemeDifferences are sufficiently large that Scheme should not be considered to be a "Lisp". This topic has spread across many pages, so this page has been created to quarantine this particular HolyWar.

And one interesting reason why not:

> The philosophies of the Lisp and Scheme communities have diverged quite a bit. SchemeLanguage focuses much more on FunctionalProgramming; CommonLisp on metaprogramming and multi-paradigm programming (especially OO with CommonLispObjectSystem).

chc4 · on March 28, 2019

I did a MAL implementation too, and am currently working on an LLVM compiler, but dear god I'd never frame it as "spent x years learning to make a compiler".

MAL is easy, and straightforward - I'm not saying this to devalue your work (great job!) but to point out that the language is by design simple to implement and doesn't require any prior knowledge of how an interpreter works. People who want to write their own implementation should try that right now, and definitely not be scared off by "I spent 10 years working on this". Turning that interpeter into a compiler is an extra step on top.

mamcx · on March 28, 2019

> People who want to write their own implementation should try that right now, and definitely not be scared off by "I spent 10 years working on this".

Certainly, but the problem with implementing languages is the HUGE cliff on complexity after your first "calculator" or lisp.

You can make a simple interpreter in hours, even minutes. Then, suddenly, you get AMBITIOUS.

That is what take years!

P.D: I'm also in the hunt for a relational language (http://tablam.org), and I now in the what, 3 years of it? (Whoa! I search for my old mentions of it and the first I found is from 2013! https://www.clubdelphi.com/foros/showthread.php?t=84835&high...)...

jasim · on March 28, 2019

"Become a spiritual attempt at the dbase/foxpro family of database-oriented languages, in the sense manipulate data(bases) is natural and integrated"

This is a personal dream for me as well. The commenters in this thread (@gavinpc, @vidarh) will also agree with you: https://news.ycombinator.com/item?id=18634555

mamcx · on March 28, 2019

How I miss that thread?

Anyway, if you wanna join the dream, I will not complain ;)

jjuel · on March 28, 2019

Well it says they didn't discover MAL until 2018 after they'd been working on creating compilers for quite a few years. So yes their implementation of MAL wasn't 10 years long. And they also wrote an interpreter in Rust and then a compiler in C in the time after discovering MAL. Safe to say the MAL portion takes quite a bit less than 10 years.

Dangeranger · on March 28, 2019

Glad you wrote this set of experiences up.

I’ve followed BuildYourOwnLisp[0] in the past, so it’s cool to see something that is more focused on the compilation of Lisp rather than its implementation as a language.

[0] http://buildyourownlisp.com

blu42 · on March 28, 2019

Ha! If I only knew about 'Build Your Own Lisp' three months ago!

I needed a simple language as a vehicle for a compiler talk I'm preparing for this summer, so I hacked along an extremely reduced LISP (a 'Non-LISP', as I call it), whose C++ implementation from scratch came out at about 1K -- 1.2K LOC with the AST optimizations I meant to demonstrate for the talk. Self-contained code here -- no libs or (much) STL: https://github.com/blu/tinl

OP, thank you for sharing Tim Morgan's work -- it's a work of love!

e12e · on March 28, 2019

> whose C++ implementation from scratch...

You're aware of clasp?

https://github.com/clasp-developers/clasp

Which builds (in part) on embeddable common lisp:

https://gitlab.com/embeddable-common-lisp/ecl

See around 20-30 minutes: https://youtu.be/8X69_42Mj-g

blu42 · on March 28, 2019

Nope, I'm seeing clasp for the first time -- Christian Schafmeister's talk was extremely nice to listen to!

I did come across some small LISP implementations at the early stages, but by that time I already had the AST builder done. Maybe because I didn't actively search for LISP implementations, as I didn't need a 'proper' LISP per se, more of a DSL for quickly writing ASTs of arbitrary complex computational expressions. Those ASTs were the final goal ; )

mafm · on March 28, 2019

Nice! Lots of really practical advice.

I wish I'd been given that advice before I wrote a compiler. Back then, the best advice was in Richard Bornat's Understanding and Writing Compilers and the dragon book - useful, but not a walkthrough.

th0br0 · on March 28, 2019

> Writing C code and trying to keep it indented was a bit of a pain and I wish I would have done something else. I believe some compilers write ugly code and then “pretty it up” with a library before writing it out. This is something to explore!

There's a tool for this nowadays: `clang-format`.

fwip · on March 28, 2019

And if you only trust tools old enough to drink, you can use `indent`.

https://www.gnu.org/software/indent/manual/indent.html

eatonphil · on March 28, 2019

+1 for clang-format. I have been using it as a beautifier on generated code in a TypeScript to C++ compiler.

There are some other features I wish it did for me in cleaning up my generated code. For example it doesn't remove superfluous parenthesis. It doesn't remove unused labels. It doesn't remove superfluous semi-colons (a single line of just a semi-colon). And so on. (I should, of course, just be building up an AST and pretty-printing the AST instead.)

pfdietz · on March 28, 2019

clang-tidy?

eatonphil · on March 28, 2019

Haven't looked into it. Thanks!

preetamjinka · on March 28, 2019

Great post! I started writing a Lisp interpreter in Go with the mal guide but switched to Norvig's (How to Write a (Lisp) Interpreter (in Python)) [0] since it was more approachable (IMO) as a Lisp beginner. I'm looking forward to writing a compiler next like the author.

[0] http://norvig.com/lispy.html

afranchuk · on March 28, 2019

Nice write up! It coincides a bit with my progression: I first wrote a scheme interpreter, then a scheme to LLVM IR compiler, and now I'm working on a custom lisp JIT compiler (the JIT bit plays well into macro expansion). It's interesting how that path naturally unfolds.

aidenn0 · on March 28, 2019

I'm curious what you mean by "the JIT bit plays well into macro expansion" since most lisps will expand macros AOT rather than JIT.

nitros · on March 28, 2019

Same here, one of my long ongoing projects is a scheme compiler following the same ideas as chicken (https://www.call-cc.org/) and cyclone (https://github.com/justinethier/cyclone).

The compiler is written in rust which generates C (that's currently dependent on being compiled by GCC or Clang). https://github.com/nitros12/some-scheme-compiler

bogomipz · on March 28, 2019

The author states:

>"I saw an x86 assembler written in Ruby which intrigued me, but the thought of working with assembly gave me pause."

Might anybody know the name of this Ruby-based assembler and/or where to find it?

NikkiA · on March 28, 2019

Could be Wilson:

https://github.com/seattlerb/wilson

fjfaase · on March 28, 2019

Lately, I experimented with a 'functional' script language, that uses lazy evaluation for generating C code and does not do any concatenation. It has a build-in debugger, which lets you break at output locations, which proved quite handy when debugging code that generates code. See: https://github.com/FransFaase/cyclonedds/commits/xtypes_ts

codr77 · on March 28, 2019

Lisp is a better choice than most for the first compiler/interpreter, but I would recommend starting even further down the complexity scale with Forth [0] since it lowers the risk of running out of motivation/loosing track of the goal.

[0] https://gitlab.com/sifoo/snigl

munificent · on March 28, 2019

I think Lisp is a great choice for a first interpreter, but a hard one for a first compiler. Macros make the separation between "compile time" and "runtime" really squirrely. For a first compiled language, I would do something like Pascal.

But, personally, I would really like to write a compiler for a Lisp specifically because I find the handling of macros really confusing and I don't think I'll understand it better until I actually do it.

kazinator · on March 28, 2019

In the usual design, Lisp compilers begin their job when macro-expansion is done; the input is fully expanded code with no more macros to be invoked. This separation is crystal clear; nothing squirrely about it.

If you boostrap a Lisp compiler by writing a Lisp interpreter first, with macros and all, then there is nothing left to do, macro-wise, when you set out to make the compiler.

You can freely use macros in that compiler's own source code, too.

There is an interaction between macros and compiling in the larger picture. When you have a working compiler and you integrate it into a file compiler, now your macros are executing at a bit of a different time: when your file compiler reads a file of code, it has to run the macros to expand it. Then when the compiled code is loaded, the macro calls no longer exist. The compiling happens in the development environment, whereas the loading may be in the deployed environment which may be a completely different system. So macros that assume they are executing in the production environment (due to being interpreted) are in for a rude surprise: they are now run in the build environment due to the shiny new compiler. That kind of thing can introduce "squirrely" issues.

Another squirrely issue is: what is a "top-level form". Suppose we have

   (progn (defmacro foo ...) (foo ..))

if the (foo ...) call is to be able to refer to the definition, that definition must be evaluated so that it takes effect! But that means that evaluation must be interleaved with macro expansion to some extent; we can't just expand the whole (progn ...) all at once.

The way out of this is to have requirements like: (1) any form not enclosed in another form is a top-level form and (2) if a (progn ...) is a top-level form, then its constituents are considered to be individual top-level forms, and (3) each top-level form is individually macro-expanded and compiled/evaluated.

eatonphil · on March 28, 2019

Scheme is wayyyy simpler than Forth IMO (even when you include macros and call/cc). I wrote a minimal stack-based language years ago and I still cannot get a handle around Forth. The inner interpreter and immediate words? Even after reading Let over Lambda, Forth is greek to me.

But, to be fair, getting to the point of being able to implement a recursive fibonacci algorithm is not too difficult taking either path.

kazinator · on March 28, 2019

Forth is really a kind of assembly language for a virtual machine, and not really a higher level language. It looks higher level because it doesn't have registers.

Register machine languages are "catenative" also. We can take a "mov eax, ebx" and catenate that with "move ecx, $foo" in any order we want. We are constrained, though, because different sections of code use different input and output registers (or combinations of input and output registers and stack locations and such). Forth makes those conventions uniform, and that leads to some degrees of freedom which give it a higher level flavor.

The thing to do is to make some higher level language that uses Forth as a VM; then you have understandable code.

codr77 · on March 28, 2019

I mean simple as in primitive and easy to implement.

Using Forth effectively takes practice, but I found it worth the effort for the much the same reasons as Lisp.

aasasd · on March 28, 2019

I wonder if the approach of making a compiler from an interpreter leads to lots of suboptimal code.

One point I can think of is, in an interpreter you're presumably keeping track of program's vars in your own data structures instead of just throwing pointers around. Though, this might be unavoidable with a dynamic source language.

whitten · on March 28, 2019

It certainly leads to code for a different purpose. In an interpreter you need to store enough information so you can choose the right subroutine to call at runtime. This must also include enough information as inputs to let the subroutine produce the right results. Generally I think of this as a reference to code and references to data, and a references to where to store the results. For a compiler you need all that information to decide what code to emit but you don’t need to keep it around when the emitted code is used, unless you want to debug or replicate decisions at runtime that were already resolved at compile time.

For example, if you had an index into a block of memory that you could prove was only large enough to address elements in the block, there is no need to check at runtime that it won’t be used to create out of range references for a compiler. If the same operation was done by an interpreter you might still check because the code that does the interpretation may not be solely used for that instance of the indexing/addressing operation, as you can guarantee with code produced by a compiler.

kryptiskt · on March 28, 2019

You basically need the same data structures in an interpreter and a compiler, you don't need them at run time when compiling but you need them at compile time. Lisp in Small Pieces goes seamlessly from interpreters to compilers over a few chapters and everything from the interpreters part is applicable to the compilers, but it is split between compile and run time (it's a great book BTW).

amelius · on March 28, 2019

By the way, I recommend reading about Futamura projections, as they provide some terminology, and a more generic way to view the problem:

https://en.wikipedia.org/wiki/Partial_evaluation#Futamura_pr...

dudouble · on March 28, 2019

Thank you for sharing this, it highly encourages me to embark on a similar journey.

karma_fountain · on March 28, 2019

Hello! Can I just ask, how did you learn to do the incremental bit of this?

cwt8805 · on March 28, 2019

since rely on gcc, why bother tcc

rurban · on March 28, 2019

tcc is much faster, like 20x faster. You don't need any gcc optimization, you'll do the optimization by yourself in your compiler.

You'd link to libtcc.a and don't need to worry anymore

cwt8805 · on March 29, 2019

I mean when I execute "./malcc --compile hello.mal hello", not rely on gcc to generate executable file will be good.