Writing your own Lisp-ette is a brilliant evening or weekend project, regardless of the language. It's some of the simplest non-toy parsing you can attempt, a bit of light data structure work, and understanding eval/apply is 80% of the work in implementing it. I would highly recommend anyone to have a go, and try not to follow existing code too closely: figure out the problems in your language of choice.
The post identifies some of it's own weaknesses (memory handling, errors), which are quite C specific. Or at least easier to handle in other languages, where you can punt those issues to the host language runtime. But it will be a fun extension to fix them (a job for the second evening / weekend of coding ;) )
But, imho, the beauty of writing a Lisp is that there are a bunch of things you can do from there, some more difficult, but several are achievable step-by-step in a day or a few days each. I'd first add a few special forms more than the OP (quote, if, progn, math operations), then my suggestions:
1. Defining names (if you haven't already), both let and define special forms.
2. Lambdas.
3. Tail call optimisation (I suggest this not because it's an optimisation, this Lisp doesn't need optimising, but because TCO is a bite-sized extension.)
4. Lexical scoping of lambdas.
5. Continuations. call/cc
6. And if you're really brave (or skilled, or just masochistic), macros.
I was encouraged to do this as a new grad student, and it was one of the most fun and educational experiences I remember. I didn't get as far as macros back then, but implementing call/cc was a definite pivot point in my programming competence.
I found none of these particularly difficult. (I've never attempted 3. or 5. though.) Functional values, however, are difficult. They work effortlessly in interpreted code, but in compiled code you're faced with the upwards funarg problem: https://en.wikipedia.org/wiki/Funarg_problem#Upwards_funarg_...
> They work effortlessly in interpreted code, but in compiled code you're faced with the upwards funarg problem
No it's not about interpreted or compiled. It's about using the native stack or a heap for stack frames. Compiled code and interpreters can both use either, so that's orthogonal.
Yes, to solve the upwards funarg problem you can't rely on a single stack. The problem is that using the heap for everything instead comes with a heavy computational cost, and using the heap only when you need it means you need to identify those circumstances, which isn't trivial.
Another fun one is a Forth interpreter. I tried to make one in Rust once, I can't say I actually succeeded but it's fun to tinker with. It is simple enough that you can probably write a bad one in assembly without that much assembly knowledge.
Not sure why you're downvoted (people don't like swear words?). As a "my first interpreter" project Brainfuck is really quick, fun and could motivate you to explore the topic further.
Just think of them as defining functions that take s expressions in and spit s expressions out before “normal” runtime evaluation and which hence only see built in symbols.
"This paper describes a modified form of Kohlbecker’s algo-
rithm for reliably hygienic (capture-free) macro expansion
in block-structured languages, where macros are source-to-
source transformations specified using a high-level pattern
language."
Lexical scoping is essential if you want to use the Lisp for anything essential. If you start with dynamic scope for everything and the system develops any real complexity, then switching to lexical scoping becomes very painful. Best to do it right the first time.
Here's a small implementation of Kernel in around 700 lines of Haskell [https://github.com/vito/hummus/tree/master/src/Hummus]. Note that it's not Kernel proper, as it uses delimited continuations instead of the guarded undelimited continuations in the Kernel report, plus there's a few other differences and some bugs.
Anyone looking for a more complete implementation, check out klisp [http://klisp.org]. It's not 100%, but has some stuff not included in the Kernel report, some of it lifted from R6RS. The main thing it's missing which it desperately needs is an FFI.
Also, another, more interesting Lisp by the same author: (http://piumarta.com/software/maru/). Maru is basically a lisp where some of the core functions like eval and apply are extensible from user code. There's basically a global map of types to evaluators and applicators, with some functions for you to register your own types and get the evaluation behavior you want.
It seems like arpilisp is inspired by jonesforth[0]. Although not directly stated, the style is similar and the acknowledgements mentions Richard Jones. Anyone interested in implementing simple programming languages might also want to take a look at jonesforth.
Someone once suggested to me the easiest way to "bootstrap the world" is to write a Forth implementation in assembly, and then write your Lisp in Forth.
and their opinion was that it was easier to do the lisp than it was to do the FORTH.
Although right now they are trying to improve their C Compiler prototype https://github.com/oriansj/M2-Planet before they convert it to assembly
Since this is written in assembly is it much faster than a C version since this one can manage its own stack frames and stack variables and such? I always imagined that’s the case and that a lisp implemented fully in assembly would be the trick to a super fast lisp that can complete with Go.
An interpreter written in assembly is still an interpreter. Yes, it may have control over stack frames. But so does a compiler, and the code generated by a compiler doesn't incur interpretation overhead. If you want performance, compliation is the way to go. And that's what fast Lisps do.
If you were to implement the same Lisp in C then compare, then maybe the assembly variant would be faster for the reasons you mention. Or maybe not.
Also, I modeled arpilisp after the original Lisp. That's barely a first step, and possibly the wrong first step, for anything non-trivial, including applications requiring a "super fast lisp that can compete with Go."
But I didn't write arpilisp for performance. I wrote it to learn and share. Enjoy!
LISP was created a long time ago. Assembly was the weapon of choice. Thinking Machines, Symbolics and Macsyma might already say: We did that. But, uh, no.
That still evaluates x twice, which can also be a source of bugs. I usually take one of two approaches: either decide that this is a weekend hack and using the macros whenever the expansion isn't obvious in my head is a sign of too much complexity, or use this GCC extension:
#define is_space(x) ({ typeof(x) y = x; y == ' ' || y = '\n'; })
(Or in this case, turn it into an actual function and let the compiler figure out optimization.)
Constexpr is not necessary. Only required if you want to ensure that a variable is initialized with a pre-computed value. The compiler will optimize in either case if it can, regardless of the constexpr attribute.
I'd argue that is_space() should be isspace(), the standard from <ctypes.h>. Pretty sure it wouldn't make the semantics of the language worse, but it's one less wheel re-invented and makes the code a smidgen easier to read since there's one less concept to learn in it. Also one less thing to debug ...
While this is very cool, it has at least one buffer overflow vulnerability. There is no bounds checking in gettoken().
Also, the talk about pointers being aligned to "8 bit boundaries" I think means 8 byte boundaries. Memory is not bit-addressable (at least, not in C, on anything popular).
But I don't mean to detract from the project! It is very cool nonetheless :)
That's fun. Here's Jisp, my Lisp implementation in a tiny bit of JavaScript. It supports declaring functions, interop with JavaScript, quoting a list, variable arg lists, and more: http://www.ducklet.com/jisp/.
See also Ben Lynn's implementation[1] in about 100 lines of Haskell. Admittedly it's not a totally fair comparison because it's relying on Haskell's runtime, but it's still an excellent demonstration of Haskell's power and expressiveness compared with a lower level language.
To be fair, it is not just that Haskell is a "high-level language", but that it is a language in the ML family of languages. ML means "Meta Language" and was specifically designed to be used to implement (other) languages.
Awesome article, thanks for that. As someone who almost never even looks at C code, this was very understandable with the inline comments.
What's the reason for using macros instead of real functions? Is this an optimization because macros get inlined at compile time? Does this really bring a lot of value?
In the eval the dynamic intern("if") ... for all interned symbols should be moved to compile-time global storage of those interns.
Otherwise it looks good.
In reality one would use more tag bits, no just one. Typically 3.
Oof, this is full to the brim of bad C practices. Use macros way less liberally, don't do this ridiculous pointer tagging thing, and leverage more of the stdlib.
For everyone interested in learning about these types of things, check out Make-A-Lisp project [0]. There's more lines of code, but also more features. The guide is awesome and split into several self-contained steps.
You could also have a look at this essay,
http://lambdaway.free.fr/workshop/?view=lambdacode, following Peter Norvig's lispy, written in less than 300 lines of plain Javascript and working in any web browser.
if (is_pair(cdr(ob))) {
printf(" ");
print_obj(cdr(ob), 0);
}
How could this `if` statement ever evaluate to false? We already verified that `cdr(ob) != 0`, and the CDR can never be a plain old string, so isn't this `if` superfluous?
I don't know about this implementation in particular, but in general a cons cell is just an object with two slots in it. You can use it to represent a list (by putting another cons or nil in the cdr), but you can also use it to represent other things. For example, a pair, by putting an arbitrary object in the car and cdr. This is a pretty common technique, see "a-list" for one application.
Even if that were possible (and with this codebase, I don't think it is), I don't see any code in this function that would print the atom in such a case. It seems like it would just print nothing.
But what about the other way around? Would it be possible to get a C compiler written in Lisp in ~200 lines as well? I mean, not a linker and macro processor etc, just the compiler to get objects.
No calls to free() to match calloc(), no overflow checking in gettoken() and then I read this:
> a program with missing or unmatched parenthesis, unresolved symbols, etc will likely just result in something like a segmentation fault.
I understand this is just a fun thing to hack on, but this is an irresponsible way to write software. I hope no one here is reading this and thinking it's how they should be writing C.
It's a lot longer if it's done properly. You should use an array of dotted pairs to store your list elements, out of which you build a free list, from which you allocate by calling cons. To garbage collect, you mark all objects in use (on the stack, or accessible from the symbol table) and then you add the unmarked conses to the free list. You should not be using malloc or calloc at all, certainly not to implement cons.
Below is some of the relevant code in C/C++ from the virtual machine of Emblem (the Lisp dialect I'm using to implement inter alia my visual dataflow language, Full Metal Jacket). To keep things short I haven't included initialization or the garbage collector. cons_op takes the top two stack entries (one on the stack, the other in tos), conses them, and returns it in tos.
-------------------
typedef struct ListStruct {
void *head;
void *tail;
} *List;
static struct ListStruct consTable[NUM_CONSES];
static List freeStore;
#define hd(X) (((List)(X))->head)
#define tl(X) (((List)(X))->tail)
#define NIL (void *)&consTable[0]
#define null(X) ((X) == NIL)
static void **stack;
static void *tos; /* Top of stack register. */
static long sp;
static void cons_op()
{
List x;
if (null(freeStore)) { cons_gc(); }
x = freeStore;
freeStore = (List)tl(freeStore);
hd(x) = stack[sp++];
tl(x) = tos;
tos = x;
}
----------------------
Alternatively, instead of C you could use a garbage collected language such as Java, C#, or Go, and then not have to worry about memory management.
The author isn't responsible if people read it and think it's how they should be writing C, especially when the objective of writing lisp in as little C as possible is communicated.
> The author isn't responsible if people read it and think it's how they should be writing C,
Seems we disagree on the basic premise then. If someone learns the wrong thing based on my example I view myself as responsible. (I have not been a perfect example all the time either. We owe it to ourselves and others to always improve on that front.)
I would add that there is already hundreds of implementations of Lisp in C and hundreds of little tutorials that are extremely similar to this online anyway, it's not like one more is going to make a difference.
Yes, his C code could be a lot better, but at the end of the day it really doesn't matter and nobody is going to be using this for anything important.
it's not irresponsible, but copy/paste proliferation does exist, and happens surprisingly often with code posted online like this. When you write code for posting online, you'd want to consider this problem.
People often do what they love with the tools they love. Complaining that someone wrote a hobby project with C is akin to complaining that your neighbor made a decorative baseball bat with her lathe instead of a 3D printer.
Writing Lisp in 200 lines of Haskell wouldn’t be nearly as interesting. The whole point is (presumably) that it’s fun to code golf in C because it’s not very expressive. Relax, no one is going to use the OP’s code to run a pacemaker.
Sorry, this is wrong. A segfault means you got lucky and your bad memory access hit an unmapped or otherwise invalid page. Other times the program will keep running with incorrect results. Many such issues represent exploitable bugs.
In any case it's a bad issue and should not be a normal failure mode for a syntax error.
"Other times the program will keep running with incorrect results. Many such issues represent exploitable bugs."
None of which matter if it's a prototype, running on a developers machine.
If you want memory safety you run the program through valgrind anyway with a large input dataset. And write unit tests. And integration tests. And so on.
The baggage of production quality software development environment is so high it easily stifles the joy of quick and dirty prototypes.
The main use of prototype is to facilitate understanding. This is the most critical constraint, whose needs drive over anything else.
Besides, who on earth is going to exploit a few hundreds of lines of code a developer runs on his or her own machine?
I don't know if you noticed this but C hackers are a minority on here. Thus it's not unreasonable to think people get an impression of how C is done from submissions here. We owe it to the craft to serve as good examples. It's not as burdensome as you suggest.
I see your point, but respectfully disagree that an incomplete implementation should stop a submitter from sharing his or her discoveries. This is an aesthetic position - I think interesting ideas are more important than clean implementations (It is left as an exercise to the reader... is an ok position, IMO).
I think you are mistaken. There is much to learn from implementations such as this, and the techniques used form the basis of much more complex systems. The technology remains very relevant.
FWIW, I didn't downvote you, as I don't downvote someone simply because I believe they're wrong, or because I disagree with them.
While I agree that it's not just historical interest - there's still much that can be learned from Lisp - this implementation is certainly not the place to learn it. There's no garbage collection, which is really the biggest concern of a proper lisp implementation in a language like C. Without GC, it's a gimmick implementation. There's no practical use for it.
There is much to be learned from incomplete, toy implementations. Not least, the interested reader can see if they can extend the existing implementation to include the missing features.
Insisting that learners/students should only ever read and study complete, perfect implementations is, I think, a mistake. I've learned a great deal from studying, and subsequently improving and extending, implementations that are imperfect and incomplete.
The post identifies some of it's own weaknesses (memory handling, errors), which are quite C specific. Or at least easier to handle in other languages, where you can punt those issues to the host language runtime. But it will be a fun extension to fix them (a job for the second evening / weekend of coding ;) )
But, imho, the beauty of writing a Lisp is that there are a bunch of things you can do from there, some more difficult, but several are achievable step-by-step in a day or a few days each. I'd first add a few special forms more than the OP (quote, if, progn, math operations), then my suggestions:
1. Defining names (if you haven't already), both let and define special forms.
2. Lambdas.
3. Tail call optimisation (I suggest this not because it's an optimisation, this Lisp doesn't need optimising, but because TCO is a bite-sized extension.)
4. Lexical scoping of lambdas.
5. Continuations. call/cc
6. And if you're really brave (or skilled, or just masochistic), macros.
I was encouraged to do this as a new grad student, and it was one of the most fun and educational experiences I remember. I didn't get as far as macros back then, but implementing call/cc was a definite pivot point in my programming competence.