Lisp in fewer than 200 lines of C

sago · on Nov 26, 2017

Writing your own Lisp-ette is a brilliant evening or weekend project, regardless of the language. It's some of the simplest non-toy parsing you can attempt, a bit of light data structure work, and understanding eval/apply is 80% of the work in implementing it. I would highly recommend anyone to have a go, and try not to follow existing code too closely: figure out the problems in your language of choice.

The post identifies some of it's own weaknesses (memory handling, errors), which are quite C specific. Or at least easier to handle in other languages, where you can punt those issues to the host language runtime. But it will be a fun extension to fix them (a job for the second evening / weekend of coding ;) )

But, imho, the beauty of writing a Lisp is that there are a bunch of things you can do from there, some more difficult, but several are achievable step-by-step in a day or a few days each. I'd first add a few special forms more than the OP (quote, if, progn, math operations), then my suggestions:

1. Defining names (if you haven't already), both let and define special forms.

2. Lambdas.

3. Tail call optimisation (I suggest this not because it's an optimisation, this Lisp doesn't need optimising, but because TCO is a bite-sized extension.)

4. Lexical scoping of lambdas.

5. Continuations. call/cc

6. And if you're really brave (or skilled, or just masochistic), macros.

I was encouraged to do this as a new grad student, and it was one of the most fun and educational experiences I remember. I didn't get as far as macros back then, but implementing call/cc was a definite pivot point in my programming competence.

ioddly · on Nov 26, 2017

> Writing your own Lisp-ette is a brilliant evening or weekend project, regardless of the language.

Just be careful about scope creep. I started one as a weekend project five years ago and I'm still not finished ;)

DonaldFisk · on Nov 26, 2017

I found none of these particularly difficult. (I've never attempted 3. or 5. though.) Functional values, however, are difficult. They work effortlessly in interpreted code, but in compiled code you're faced with the upwards funarg problem: https://en.wikipedia.org/wiki/Funarg_problem#Upwards_funarg_...

A solution is needed if you want lazy evaluation.

chrisseaton · on Nov 26, 2017

> They work effortlessly in interpreted code, but in compiled code you're faced with the upwards funarg problem

No it's not about interpreted or compiled. It's about using the native stack or a heap for stack frames. Compiled code and interpreters can both use either, so that's orthogonal.

DonaldFisk · on Nov 27, 2017

Yes, to solve the upwards funarg problem you can't rely on a single stack. The problem is that using the heap for everything instead comes with a heavy computational cost, and using the heap only when you need it means you need to identify those circumstances, which isn't trivial.

sago · on Nov 26, 2017

How did you do lambdas and lexical scoping without 'functional values'?

DonaldFisk · on Nov 26, 2017

That's explained in https://en.wikipedia.org/wiki/Funarg_problem#Downwards_funar...

In addition to return address and dynamic link, you need to store a static link in each stack frame.

theoh · on Nov 26, 2017

Presumably by implementing only the simpler "downwards" funargs: the ability to pass functions into a function scope but never return them.

agolliver · on Nov 26, 2017

5 is do-able by an undergrad, though the first time you encounter the concept it can be difficult to grapple how you would implement it.

roywiggins · on Nov 26, 2017

Another fun one is a Forth interpreter. I tried to make one in Rust once, I can't say I actually succeeded but it's fun to tinker with. It is simple enough that you can probably write a bad one in assembly without that much assembly knowledge.

geon · on Nov 27, 2017

Or Brainfuck. It's fun and only takes half an hour or so.

smcl · on Nov 27, 2017

Not sure why you're downvoted (people don't like swear words?). As a "my first interpreter" project Brainfuck is really quick, fun and could motivate you to explore the topic further.

kovrik · on Nov 27, 2017

Agree.

I'm writing Scheme R5RS in Kotlin (https://github.com/kovrik/scheme-in-kotlin) and have implemented everything except macros (6).

Have no idea how to beat them.

_diyu · on Nov 27, 2017

Just think of them as defining functions that take s expressions in and spit s expressions out before “normal” runtime evaluation and which hence only see built in symbols.

kovrik · on Nov 27, 2017

Yes, I get that (in general).

But then there are things like hygiene, performance and some tricky edge-cases.

And I couldn't find any standard (and simple) algorithm to implement macros (preferably written in something other than Scheme itself).

Still trying to wrap my head around.

UncleEntity · on Nov 27, 2017

"This paper describes a modified form of Kohlbecker’s algo- rithm for reliably hygienic (capture-free) macro expansion in block-structured languages, where macros are source-to- source transformations specified using a high-level pattern language."

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.464...

quotemstr · on Nov 27, 2017

Lexical scoping is essential if you want to use the Lisp for anything essential. If you start with dynamic scope for everything and the system develops any real complexity, then switching to lexical scoping becomes very painful. Best to do it right the first time.

reacweb · on Nov 27, 2017

7 typing and Hindley-Milner inference

tree_of_item · on Nov 26, 2017

If you go with (hygenic) fexprs instead of macros, they're both easier to implement and more expressive, at the expense of performance.

Check out John Shutt's thesis for more information on hygenic fexprs: https://web.wpi.edu/Pubs/ETD/Available/etd-090110-124904/unr...

sparkie · on Nov 26, 2017

Here's a small implementation of Kernel in around 700 lines of Haskell [https://github.com/vito/hummus/tree/master/src/Hummus]. Note that it's not Kernel proper, as it uses delimited continuations instead of the guarded undelimited continuations in the Kernel report, plus there's a few other differences and some bugs.

Anyone looking for a more complete implementation, check out klisp [http://klisp.org]. It's not 100%, but has some stuff not included in the Kernel report, some of it lifted from R6RS. The main thing it's missing which it desperately needs is an FFI.

radarsat1 · on Nov 26, 2017

On the subject of Haskell, what's the shortest Haskell (possibly Haskell subset) interpreter anyone has written?

sago · on Nov 26, 2017

Thanks for the tip. If it wasn't the end of the weekend, I'd be tempted to break out emacs.

piinbinary · on Nov 26, 2017

After reading Peter Norvig's post about a lisp implementation [0], I decided to write one in Python.

I got something working in 65 lines [1]

[0] http://norvig.com/lispy.html

[1] https://gist.github.com/jmikkola/b7c6c644dff1c07891c698f0a52...

Wohlf · on Nov 27, 2017

Thanks for this, I recently did the same and it's nice to have others work to compare to.

akashakya · on Nov 26, 2017

If you like this, you might like Lisp interpreter written in assembly in a single file. It is one of the best commented code ever written imo.

https://github.com/marcpaq/arpilisp

sparkie · on Nov 26, 2017

Another small lisp implementation: http://piumarta.com/software/lysp/

Also, another, more interesting Lisp by the same author: (http://piumarta.com/software/maru/). Maru is basically a lisp where some of the core functions like eval and apply are extensible from user code. There's basically a global map of types to evaluators and applicators, with some functions for you to register your own types and get the evaluation behavior you want.

fasquoika · on Nov 27, 2017

It seems like arpilisp is inspired by jonesforth[0]. Although not directly stated, the style is similar and the acknowledgements mentions Richard Jones. Anyone interested in implementing simple programming languages might also want to take a look at jonesforth.

[0]: https://github.com/nornagon/jonesforth

marcpaq · on Dec 4, 2017

Arpilisp author here.

Yes, jonesforth definitely inspired and influenced me; that's why Jones is first in the Acknowledgements section.

If you're curious, I keep a list of single-file implementations of programming languages (including jonesforth):

https://github.com/marcpaq/b1fipl

nielsbot · on Nov 27, 2017

Someone once suggested to me the easiest way to "bootstrap the world" is to write a Forth implementation in assembly, and then write your Lisp in Forth.

z29LiTp5qUC30n · on Nov 27, 2017

as there is a project aiming to "bootstrap the world" https://savannah.nongnu.org/projects/stage0/ and they hand wrote a FORTH http://git.savannah.nongnu.org/cgit/stage0.git/tree/stage2/f... and a compacting garbage collecting lisp http://git.savannah.nongnu.org/cgit/stage0.git/tree/stage2/l...

and their opinion was that it was easier to do the lisp than it was to do the FORTH. Although right now they are trying to improve their C Compiler prototype https://github.com/oriansj/M2-Planet before they convert it to assembly

_diyu · on Nov 27, 2017

Since this is written in assembly is it much faster than a C version since this one can manage its own stack frames and stack variables and such? I always imagined that’s the case and that a lisp implemented fully in assembly would be the trick to a super fast lisp that can complete with Go.

tom_mellior · on Nov 27, 2017

An interpreter written in assembly is still an interpreter. Yes, it may have control over stack frames. But so does a compiler, and the code generated by a compiler doesn't incur interpretation overhead. If you want performance, compliation is the way to go. And that's what fast Lisps do.

marcpaq · on Dec 4, 2017

Arpilisp author here.

If you were to implement the same Lisp in C then compare, then maybe the assembly variant would be faster for the reasons you mention. Or maybe not.

Also, I modeled arpilisp after the original Lisp. That's barely a first step, and possibly the wrong first step, for anything non-trivial, including applications requiring a "super fast lisp that can compete with Go."

But I didn't write arpilisp for performance. I wrote it to learn and share. Enjoy!

ci5er · on Nov 27, 2017

LISP was created a long time ago. Assembly was the weapon of choice. Thinking Machines, Symbolics and Macsyma might already say: We did that. But, uh, no.

klmr · on Nov 27, 2017

The first LISP implementation was famously done in machine code, not assembly.

kindfellow92 · on Nov 27, 2017

Oof, all the macros are broken:

    #define is_space(x)  (x == ' ' || x == '\n')
    #define is_parens(x) (x == '(' || x == ')')

Should be

    #define is_space(x)  ((x) == ' ' || (x) == '\n')
    #define is_parens(x) ((x) == '(' || (x) == ')')

Probably doesn’t matter in practice for this. It could end up being a nasty source of bug later on in the project.

geofft · on Nov 27, 2017

That still evaluates x twice, which can also be a source of bugs. I usually take one of two approaches: either decide that this is a weekend hack and using the macros whenever the expansion isn't obvious in my head is a sign of too much complexity, or use this GCC extension:

    #define is_space(x) ({ typeof(x) y = x; y == ' ' || y = '\n'; })

(Or in this case, turn it into an actual function and let the compiler figure out optimization.)

kindfellow92 · on Nov 27, 2017

GCC extensions make my brain hurt :( Should just use C++ at that point:

    template<class T>
    bool is_space(const T & x) {
        return x == ‘ ‘ || x == ‘\n’;
    }

pjmlp · on Nov 27, 2017

Even better if you make it:

    template<class T>
    constexpr bool is_space(const T & x) {
        return x == ' ' || x == '\n';
    }

Debuggable, type safe and same performance as straight C code.

kindfellow92 · on Nov 27, 2017

Constexpr is not necessary. Only required if you want to ensure that a variable is initialized with a pre-computed value. The compiler will optimize in either case if it can, regardless of the constexpr attribute.

pjmlp · on Nov 27, 2017

> Only required if you want to ensure that a variable is initialized with a pre-computed value

Yep.

IvanVergiliev · on Nov 27, 2017

And throw an `inline` in there just to be more likely to end up with something macro-like.

atilaneves · on Nov 27, 2017

`constexpr` is implicitly `inline`: http://en.cppreference.com/w/cpp/language/inline

gpderetta · on Nov 28, 2017

orthogonally, templates are also implicitly inline.

Then again, inline does not mean what most people think.

kindfellow92 · on Nov 27, 2017

“inline” is more about linkage than about actual inlining. The compiler will in-line regardless of that attribute.

klmr · on Nov 27, 2017

The function can be trivially implemented in C, since you don’t need a generic input argument type. No idea why OP chose macros in the first place.

unwind · on Nov 27, 2017

I'd argue that is_space() should be isspace(), the standard from <ctypes.h>. Pretty sure it wouldn't make the semantics of the language worse, but it's one less wheel re-invented and makes the code a smidgen easier to read since there's one less concept to learn in it. Also one less thing to debug ...

unwind · on Nov 27, 2017

Heh. Make that <ctype.h>, of course. D'oh. :)

SonOfLilit · on Nov 26, 2017

If you like short interpreter implementations, you might like this: http://code.jsoftware.com/wiki/Essays/Incunabulum

tromp · on Nov 26, 2017

or this interpreter of a lazy lambda calculus based language in 25 lines of (obfuscated) C:

http://www.ioccc.org/2012/tromp/tromp.c

http://www.ioccc.org/2012/tromp/hint.html

agumonkey · on Nov 26, 2017

I love how it demonstrates how recursion/loops and shape/dimension based programming can be both thin, fast and generic

krat0sprakhar · on Nov 26, 2017

Really cool! A more complete implementation in 1000 lines of C for getting started: http://www.buildyourownlisp.com

kshahkshah · on Nov 27, 2017

Can't recommend it enough, I learned a lot from this tutorial. I hope to go back and add a lot more languages features.

jstanley · on Nov 26, 2017

While this is very cool, it has at least one buffer overflow vulnerability. There is no bounds checking in gettoken().

Also, the talk about pointers being aligned to "8 bit boundaries" I think means 8 byte boundaries. Memory is not bit-addressable (at least, not in C, on anything popular).

But I don't mean to detract from the project! It is very cool nonetheless :)

BMorearty · on Nov 26, 2017

That's fun. Here's Jisp, my Lisp implementation in a tiny bit of JavaScript. It supports declaring functions, interop with JavaScript, quoting a list, variable arg lists, and more: http://www.ducklet.com/jisp/.

TimTheTinker · on Nov 27, 2017

Have you taken a look at ClojureScript? You might find it interesting.

lisper · on Nov 26, 2017

Lisp with full lexical closures in ~100 lines of Python:

http://www.flownet.com/ron/lisp/l.py

The interpreter itself is 48 lines.

raphlinus · on Nov 26, 2017

See also Ben Lynn's implementation[1] in about 100 lines of Haskell. Admittedly it's not a totally fair comparison because it's relying on Haskell's runtime, but it's still an excellent demonstration of Haskell's power and expressiveness compared with a lower level language.

[1]: https://crypto.stanford.edu/~blynn/lambda/lisp.html

vog · on Nov 26, 2017

To be fair, it is not just that Haskell is a "high-level language", but that it is a language in the ML family of languages. ML means "Meta Language" and was specifically designed to be used to implement (other) languages.

mweibel · on Nov 27, 2017

Awesome article, thanks for that. As someone who almost never even looks at C code, this was very understandable with the inline comments.

What's the reason for using macros instead of real functions? Is this an optimization because macros get inlined at compile time? Does this really bring a lot of value?

klmr · on Nov 27, 2017

> Does this really bring a lot of value?

In this particular case, none whatsoever. It’s egregious abuse of macros.

agrafix · on Nov 27, 2017

This motivated me to hack together a Haskell implementation, but with a little better error handling :) https://github.com/agrafix/micro-lisp

_ooqq · on Nov 27, 2017

Things that bugs me: cast to (long) when they should use intptr_t.

EDIT: And gettoken() should check against buffer: index < sizeof token.

EDIT 2: And I'd store the tag in a separate variable, because bit abuse in a pointer is plain and simply asking for problems.

rurban · on Nov 26, 2017

In the eval the dynamic intern("if") ... for all interned symbols should be moved to compile-time global storage of those interns. Otherwise it looks good. In reality one would use more tag bits, no just one. Typically 3.

woadwarrior01 · on Nov 26, 2017

Reminds of this book[1] I’d read a couple of years ago.

[1]: http://www.buildyourownlisp.com/

ddevault · on Nov 27, 2017

Oof, this is full to the brim of bad C practices. Use macros way less liberally, don't do this ridiculous pointer tagging thing, and leverage more of the stdlib.

petethepig · on Nov 26, 2017

For everyone interested in learning about these types of things, check out Make-A-Lisp project [0]. There's more lines of code, but also more features. The guide is awesome and split into several self-contained steps.

[0] https://github.com/kanaka/mal/blob/master/process/guide.md

martyalain · on Nov 28, 2017

You could also have a look at this essay, http://lambdaway.free.fr/workshop/?view=lambdacode, following Peter Norvig's lispy, written in less than 300 lines of plain Javascript and working in any web browser.

Your opinion is welcome. Alain Marty

raldi · on Nov 27, 2017

Toward the end of `print_obj()`, we see:

        if (is_pair(cdr(ob))) {
          printf(" ");
          print_obj(cdr(ob), 0);
        }

How could this `if` statement ever evaluate to false? We already verified that `cdr(ob) != 0`, and the CDR can never be a plain old string, so isn't this `if` superfluous?

orodley · on Nov 27, 2017

I don't know about this implementation in particular, but in general a cons cell is just an object with two slots in it. You can use it to represent a list (by putting another cons or nil in the cdr), but you can also use it to represent other things. For example, a pair, by putting an arbitrary object in the car and cdr. This is a pretty common technique, see "a-list" for one application.

rurban · on Nov 27, 2017

Nope. It still can be an atom. Only if it's a pair (i.e. a cons with two cells, not just one) it prints the next.

raldi · on Nov 27, 2017

Even if that were possible (and with this codebase, I don't think it is), I don't see any code in this function that would print the atom in such a case. It seems like it would just print nothing.

rurban · on Nov 28, 2017

Look harder, it does. There are only cons (pair) and atoms.

raldi · on Nov 28, 2017

The author just confirmed, the if-statement does nothing: https://github.com/carld/micro-lisp/issues/9

raldi · on Nov 28, 2017

Can you give me an example program that, when fed as input, would cause this if-statement to evaluate to false?

oneplane · on Nov 26, 2017

But what about the other way around? Would it be possible to get a C compiler written in Lisp in ~200 lines as well? I mean, not a linker and macro processor etc, just the compiler to get objects.

sigjuice · on Nov 27, 2017

Nice project that I will spend some time studying. However, fewer than 200 lines is not really a virtue, IMHO.

asveikau · on Nov 26, 2017

No calls to free() to match calloc(), no overflow checking in gettoken() and then I read this:

> a program with missing or unmatched parenthesis, unresolved symbols, etc will likely just result in something like a segmentation fault.

I understand this is just a fun thing to hack on, but this is an irresponsible way to write software. I hope no one here is reading this and thinking it's how they should be writing C.

DonaldFisk · on Nov 26, 2017

It's a lot longer if it's done properly. You should use an array of dotted pairs to store your list elements, out of which you build a free list, from which you allocate by calling cons. To garbage collect, you mark all objects in use (on the stack, or accessible from the symbol table) and then you add the unmarked conses to the free list. You should not be using malloc or calloc at all, certainly not to implement cons.

Below is some of the relevant code in C/C++ from the virtual machine of Emblem (the Lisp dialect I'm using to implement inter alia my visual dataflow language, Full Metal Jacket). To keep things short I haven't included initialization or the garbage collector. cons_op takes the top two stack entries (one on the stack, the other in tos), conses them, and returns it in tos.

-------------------

    typedef struct ListStruct {
        void *head;
        void *tail;
    } *List;

    static struct ListStruct consTable[NUM_CONSES];

    static List freeStore;

    #define hd(X) (((List)(X))->head)
    #define tl(X) (((List)(X))->tail)
    #define NIL (void *)&consTable[0]
    #define null(X) ((X) == NIL)

    static void **stack;
    static void *tos; /* Top of stack register. */
    static long sp;

    static void cons_op()
    {
        List x;

        if (null(freeStore)) { cons_gc(); }
        x = freeStore;
        freeStore = (List)tl(freeStore);
        hd(x) = stack[sp++];
        tl(x) = tos;
        tos = x;
    }

----------------------

Alternatively, instead of C you could use a garbage collected language such as Java, C#, or Go, and then not have to worry about memory management.

ac2u · on Nov 26, 2017

The author isn't responsible if people read it and think it's how they should be writing C, especially when the objective of writing lisp in as little C as possible is communicated.

So why is it fair to label it irresponsible?

asveikau · on Nov 26, 2017

> The author isn't responsible if people read it and think it's how they should be writing C,

Seems we disagree on the basic premise then. If someone learns the wrong thing based on my example I view myself as responsible. (I have not been a perfect example all the time either. We owe it to ourselves and others to always improve on that front.)

DSMan195276 · on Nov 26, 2017

I would add that there is already hundreds of implementations of Lisp in C and hundreds of little tutorials that are extremely similar to this online anyway, it's not like one more is going to make a difference.

Yes, his C code could be a lot better, but at the end of the day it really doesn't matter and nobody is going to be using this for anything important.

chii · on Nov 26, 2017

it's not irresponsible, but copy/paste proliferation does exist, and happens surprisingly often with code posted online like this. When you write code for posting online, you'd want to consider this problem.

sillysaurus3 · on Nov 26, 2017

If you have to worry about the entire world whenever you do anything, you'll never get anything done.

ythn · on Nov 26, 2017

If people always wrote software responsibly we'd have a great deal fewer interesting projects on GitHub

tree_of_item · on Nov 26, 2017

Would it really be so hard to do this exact same thing in a memory safe language? Would that stifle the author's creativity in any way?

macintux · on Nov 26, 2017

People often do what they love with the tools they love. Complaining that someone wrote a hobby project with C is akin to complaining that your neighbor made a decorative baseball bat with her lathe instead of a 3D printer.

wyager · on Nov 26, 2017

Writing Lisp in 200 lines of Haskell wouldn’t be nearly as interesting. The whole point is (presumably) that it’s fun to code golf in C because it’s not very expressive. Relax, no one is going to use the OP’s code to run a pacemaker.

fsloth · on Nov 26, 2017

"but this is an irresponsible way to write software"

Segfault is a totally safe way to terminate a process on a modern desktop os. Besides - there is no 'correct' way to write software.

asveikau · on Nov 26, 2017

Sorry, this is wrong. A segfault means you got lucky and your bad memory access hit an unmapped or otherwise invalid page. Other times the program will keep running with incorrect results. Many such issues represent exploitable bugs.

In any case it's a bad issue and should not be a normal failure mode for a syntax error.

fsloth · on Nov 26, 2017

"Other times the program will keep running with incorrect results. Many such issues represent exploitable bugs."

None of which matter if it's a prototype, running on a developers machine.

If you want memory safety you run the program through valgrind anyway with a large input dataset. And write unit tests. And integration tests. And so on.

The baggage of production quality software development environment is so high it easily stifles the joy of quick and dirty prototypes.

The main use of prototype is to facilitate understanding. This is the most critical constraint, whose needs drive over anything else.

Besides, who on earth is going to exploit a few hundreds of lines of code a developer runs on his or her own machine?

asveikau · on Nov 26, 2017

I don't know if you noticed this but C hackers are a minority on here. Thus it's not unreasonable to think people get an impression of how C is done from submissions here. We owe it to the craft to serve as good examples. It's not as burdensome as you suggest.

fsloth · on Nov 26, 2017

I see your point, but respectfully disagree that an incomplete implementation should stop a submitter from sharing his or her discoveries. This is an aesthetic position - I think interesting ideas are more important than clean implementations (It is left as an exercise to the reader... is an ok position, IMO).

agumonkey · on Nov 26, 2017

I think people were referring to the free-free approach where you just quit a short lived program without deallocating anything.

Kenji · on Nov 26, 2017

If Lisp is so easy so implement, is it also easy to learn? (I am familiar with Haskell)

on Nov 26, 2017

[dead]

luckydata · on Nov 26, 2017

somebody's machine learning algorithms need better training.

anonfunction · on Nov 26, 2017

Wrong thread?

drdrey · on Nov 26, 2017

Look at "denisehilton"'s post history, it's all spam

Areading314 · on Nov 26, 2017

This is more of historical interest than of technological interest...

ColinWright · on Nov 26, 2017

I think you are mistaken. There is much to learn from implementations such as this, and the techniques used form the basis of much more complex systems. The technology remains very relevant.

FWIW, I didn't downvote you, as I don't downvote someone simply because I believe they're wrong, or because I disagree with them.

sparkie · on Nov 26, 2017

While I agree that it's not just historical interest - there's still much that can be learned from Lisp - this implementation is certainly not the place to learn it. There's no garbage collection, which is really the biggest concern of a proper lisp implementation in a language like C. Without GC, it's a gimmick implementation. There's no practical use for it.

ColinWright · on Nov 26, 2017

There is much to be learned from incomplete, toy implementations. Not least, the interested reader can see if they can extend the existing implementation to include the missing features.

Insisting that learners/students should only ever read and study complete, perfect implementations is, I think, a mistake. I've learned a great deal from studying, and subsequently improving and extending, implementations that are imperfect and incomplete.