Back in 2006 –when I was studying CS at Rollins College– I wrote a Scheme (subset of Lisp) interpreter that also shows a visual representation of linked lists and function calls. You can check it out here: http://davidpilo.com/pvts/
The interpreter was written in Java and while it does not support the full R5RS grammar it supports quite a bit of it (see http://davidpilo.com/pvts/language.html)
> One of the interesting debates in Lisp circles is the question of whether or not SchemeLanguage should be considered part of the LispLanguage family. There are more than a few Lispers (and perhaps a few schemers) who think that the LispSchemeDifferences are sufficiently large that Scheme should not be considered to be a "Lisp". This topic has spread across many pages, so this page has been created to quarantine this particular HolyWar.
And one interesting reason why not:
> The philosophies of the Lisp and Scheme communities have diverged quite a bit. SchemeLanguage focuses much more on FunctionalProgramming; CommonLisp on metaprogramming and multi-paradigm programming (especially OO with CommonLispObjectSystem).
I did a MAL implementation too, and am currently working on an LLVM compiler, but dear god I'd never frame it as "spent x years learning to make a compiler".
MAL is easy, and straightforward - I'm not saying this to devalue your work (great job!) but to point out that the language is by design simple to implement and doesn't require any prior knowledge of how an interpreter works. People who want to write their own implementation should try that right now, and definitely not be scared off by "I spent 10 years working on this". Turning that interpeter into a compiler is an extra step on top.
Well it says they didn't discover MAL until 2018 after they'd been working on creating compilers for quite a few years. So yes their implementation of MAL wasn't 10 years long. And they also wrote an interpreter in Rust and then a compiler in C in the time after discovering MAL. Safe to say the MAL portion takes quite a bit less than 10 years.
I’ve followed BuildYourOwnLisp[0] in the past, so it’s cool to see something that is more focused on the compilation of Lisp rather than its implementation as a language.
Ha! If I only knew about 'Build Your Own Lisp' three months ago!
I needed a simple language as a vehicle for a compiler talk I'm preparing for this summer, so I hacked along an extremely reduced LISP (a 'Non-LISP', as I call it), whose C++ implementation from scratch came out at about 1K -- 1.2K LOC with the AST optimizations I meant to demonstrate for the talk. Self-contained code here -- no libs or (much) STL: https://github.com/blu/tinl
OP, thank you for sharing Tim Morgan's work -- it's a work of love!
Nope, I'm seeing clasp for the first time -- Christian Schafmeister's talk was extremely nice to listen to!
I did come across some small LISP implementations at the early stages, but by that time I already had the AST builder done. Maybe because I didn't actively search for LISP implementations, as I didn't need a 'proper' LISP per se, more of a DSL for quickly writing ASTs of arbitrary complex computational expressions. Those ASTs were the final goal ; )
I wish I'd been given that advice before I wrote a compiler. Back then, the best advice was in Richard Bornat's Understanding and Writing Compilers and the dragon book - useful, but not a walkthrough.
> Writing C code and trying to keep it indented was a bit of a pain and I wish I would have done something else. I believe some compilers write ugly code and then “pretty it up” with a library before writing it out. This is something to explore!
+1 for clang-format. I have been using it as a beautifier on generated code in a TypeScript to C++ compiler.
There are some other features I wish it did for me in cleaning up my generated code. For example it doesn't remove superfluous parenthesis. It doesn't remove unused labels. It doesn't remove superfluous semi-colons (a single line of just a semi-colon). And so on. (I should, of course, just be building up an AST and pretty-printing the AST instead.)
Great post! I started writing a Lisp interpreter in Go with the mal guide but switched to Norvig's (How to Write a (Lisp) Interpreter (in Python)) [0] since it was more approachable (IMO) as a Lisp beginner. I'm looking forward to writing a compiler next like the author.
Nice write up! It coincides a bit with my progression: I first wrote a scheme interpreter, then a scheme to LLVM IR compiler, and now I'm working on a custom lisp JIT compiler (the JIT bit plays well into macro expansion). It's interesting how that path naturally unfolds.
Lately, I experimented with a 'functional' script language, that uses lazy evaluation for generating C code and does not do any concatenation. It has a build-in debugger, which lets you break at output locations, which proved quite handy when debugging code that generates code.
See: https://github.com/FransFaase/cyclonedds/commits/xtypes_ts
Lisp is a better choice than most for the first compiler/interpreter, but I would recommend starting even further down the complexity scale with Forth [0] since it lowers the risk of running out of motivation/loosing track of the goal.
I think Lisp is a great choice for a first interpreter, but a hard one for a first compiler. Macros make the separation between "compile time" and "runtime" really squirrely. For a first compiled language, I would do something like Pascal.
But, personally, I would really like to write a compiler for a Lisp specifically because I find the handling of macros really confusing and I don't think I'll understand it better until I actually do it.
In the usual design, Lisp compilers begin their job when macro-expansion is done; the input is fully expanded code with no more macros to be invoked. This separation is crystal clear; nothing squirrely about it.
If you boostrap a Lisp compiler by writing a Lisp interpreter first, with macros and all, then there is nothing left to do, macro-wise, when you set out to make the compiler.
You can freely use macros in that compiler's own source code, too.
There is an interaction between macros and compiling in the larger picture. When you have a working compiler and you integrate it into a file compiler, now your macros are executing at a bit of a different time: when your file compiler reads a file of code, it has to run the macros to expand it. Then when the compiled code is loaded, the macro calls no longer exist. The compiling happens in the development environment, whereas the loading may be in the deployed environment which may be a completely different system. So macros that assume they are executing in the production environment (due to being interpreted) are in for a rude surprise: they are now run in the build environment due to the shiny new compiler. That kind of thing can introduce "squirrely" issues.
Another squirrely issue is: what is a "top-level form". Suppose we have
(progn (defmacro foo ...) (foo ..))
if the (foo ...) call is to be able to refer to the definition, that definition must be evaluated so that it takes effect! But that means that evaluation must be interleaved with macro expansion to some extent; we can't just expand the whole (progn ...) all at once.
The way out of this is to have requirements like: (1) any form not enclosed in another form is a top-level form and (2) if a (progn ...) is a top-level form, then its constituents are considered to be individual top-level forms, and (3) each top-level form is individually macro-expanded and compiled/evaluated.
Scheme is wayyyy simpler than Forth IMO (even when you include macros and call/cc). I wrote a minimal stack-based language years ago and I still cannot get a handle around Forth. The inner interpreter and immediate words? Even after reading Let over Lambda, Forth is greek to me.
But, to be fair, getting to the point of being able to implement a recursive fibonacci algorithm is not too difficult taking either path.
Forth is really a kind of assembly language for a virtual machine, and not really a higher level language. It looks higher level because it doesn't have registers.
Register machine languages are "catenative" also. We can take a "mov eax, ebx" and catenate that with "move ecx, $foo" in any order we want. We are constrained, though, because different sections of code use different input and output registers (or combinations of input and output registers and stack locations and such). Forth makes those conventions uniform, and that leads to some degrees of freedom which give it a higher level flavor.
The thing to do is to make some higher level language that uses Forth as a VM; then you have understandable code.
I wonder if the approach of making a compiler from an interpreter leads to lots of suboptimal code.
One point I can think of is, in an interpreter you're presumably keeping track of program's vars in your own data structures instead of just throwing pointers around. Though, this might be unavoidable with a dynamic source language.
It certainly leads to code for a different purpose. In an interpreter you need to store enough information so you can choose the right subroutine to call at runtime. This must also include enough information as inputs to let the subroutine produce the right results. Generally I think of this as a reference to code and references to data, and a references to where to store the results. For a compiler you need all that information to decide what code to emit but you don’t need to keep it around when the emitted code is used, unless you want to debug or replicate decisions at runtime that were already resolved at compile time.
For example, if you had an index into a block of memory that you could prove was only large enough to address elements in the block, there is no need to check at runtime that it won’t be used to create out of range references for a compiler. If the same operation was done by an interpreter you might still check because the code that does the interpretation may not be solely used for that instance of the indexing/addressing operation, as you can guarantee with code produced by a compiler.
You basically need the same data structures in an interpreter and a compiler, you don't need them at run time when compiling but you need them at compile time. Lisp in Small Pieces goes seamlessly from interpreters to compilers over a few chapters and everything from the interpreters part is applicable to the compilers, but it is split between compile and run time (it's a great book BTW).
Back in 2006 –when I was studying CS at Rollins College– I wrote a Scheme (subset of Lisp) interpreter that also shows a visual representation of linked lists and function calls. You can check it out here: http://davidpilo.com/pvts/
The interpreter was written in Java and while it does not support the full R5RS grammar it supports quite a bit of it (see http://davidpilo.com/pvts/language.html)
Cheers!