In my experience, many people will miss the significance of this, or will simply fail to grasp how this is implementing a programming language at all. After all, you're taking advantage of the host language for every single aspect of the new language, even tokenizing/parsing.
More interesting for beginners (in my opinion) are examples implementing a simple tokenizer, parser, and compiler to another source language. This seems to be what closes the gap between programming languages as mystical constructs and programming languages as programs themselves.
This point of view does tend to displease the traditional SICP crowd though :-).
I'm inclined to agree. Implemented a Lisp in a Lisp always feels like a certain amount of hand-waving without feeling like I've really made something. Look, here's how to implement a JavaScript interpreter in JavaScript:
eval(someCode);
What I like is seeing all of the steps a typical real-world language implementation goes through, just in minimal form. To that end, a while back I wrote an interpreter for a BASIC-like language in a single Java source file. It tokenizes, parses, and interprets. It uses common real-world techniques for all of those:
It uses a hand-rolled state machine for tokenizing. Recursive descent for parsing. And the interpreter uses the visitor pattern to walk the AST.
Mine also, strangely unlike many of these so-called "teaching" toy languages, has documentation. I don't understand the point of a tiny language for people to learn from if you made it tiny by removing all of the comments. :(
And, to try to avoid leading the reader astray, it calls out any shortcuts it makes. Those are hints where you'd want to do something more robust if you weren't trying to be minimal.
I agree. A counterargument is that you're inherently taking advantage of the processor(s) and the operating system, not to mention, ha ha, every single component and factor that you did not personally create, whether or not it is computational in nature...
Another route, my present hobby and something that I know many have done before: Take the Lisp program on page 10 of the Lisp 1.5 User Manual and translate it into C. Write a simple tokenizer/parser (read function) in C and translate that into Lisp. Once the C program can run the equivalent Lisp program, we're on our way.
At the outset I have no garbage collection. I just expect the memory space to fill up pretty soon. I could have `cons` compute a hash for each cell to eliminate common subexpressions, but for a while I suppose I won't bother at all. The C program has a loop to handle evcon (did I write cons? oops), eval, and apply as a trio of mutually tail-recursive operations. That's an obvious place to put a stop-the-world mark-and-sweep operation or some such thing.
To get the C tools out of the loop eventually, hand-disassemble the binary program (C compiler output) just to see what's there. Then write a Lisp program (compiler) that translates the Lisp 1.5 User Manual program into something pretty similar.
A counter-counter argument is that implementing a language that is semantically distant from assembly language, over top of assembly language, counts as "from scratch", whereas skinning a language over with an interpreter (which doesn't even scan tokens, collect garbage or perform any I/O) doesn't count as "from scratch".
> More interesting [...] are examples implementing a simple tokenizer, parser, and compiler to another source language
> This point of view does tend to displease the traditional SICP crowd though :-).
I disagree. SICP does a fairly good job in explaining what's behind compilers and other languages. See chapter 5 "Computing with Register Machines" that follows immediately the chapter that contains the eval/apply loop.
Although the tokenizer and parser are sadly missing in SICP, these are also the most boring topics. Yes, you can write them by hand, but in the end you'll learn how to use regexes and parser generators. Which is important, but only a tiny fraction of what happens inside a compiler or interpreter.
> but in the end you'll learn how to use regexes and parser generators.
In my experience, most production compilers and interpreters use handwritten parsers, not regexes and parser generators. The latter is usually confined to small DSLs and toy compilers because things like error reporting and recovery is usually a nightmare with parser generators.
I don't agree the tonenizer and parser bit is "boring", but they are extremely well covered compared to subjects like code generation, though.
The (binary) lambda calculus interpreter at http://www.ioccc.org/2012/tromp/tromp.c, documented in http://www.ioccc.org/2012/tromp/hint.html
is as tiny (and incomprehensible) as it gets,
but does all of input, recursive descent parsing, translation to bytecode, lazy evaluation (call-by-need to be precise), garbage collection, and output.
I loved the "How I start: Nim"-article that was posted to hn a while back. Interpreter, then compile-to-nim implementation of brainfuck. All in a handful of code:
C'mon, why stop there? Implement Scheme in your toy lambda. Then implement another toy lambda in your Scheme, and a Scheme in that. Continue until it takes several minutes to do (fib 5). Then, write JIT compiler for that Scheme. See what it takes to get down to pre-Inception levels of performance.
Now, you've actually learned something: Abstractions are costly.
Many modern schemes (and similar languages like Racket and Clojure) have many optimization, use bytecode and jit compiling, so the abstraction overhead is not as big as in the naive versions. And you get nice code and powerful macros in exchange.
More interesting for beginners (in my opinion) are examples implementing a simple tokenizer, parser, and compiler to another source language. This seems to be what closes the gap between programming languages as mystical constructs and programming languages as programs themselves.
This point of view does tend to displease the traditional SICP crowd though :-).