Of course, the implemented language is simple enough, to accomodate this simple implementation. Still, looks nice, the code is very clear, and the comments are pretty good.
The syntax is of course very limited with a one-line lexer, so more sophistocation here would somewhat expand the lexer, even though if it remains centered around regexes (which it probably will), it can remain pretty small.
When expanding the language, the tree building code can probably stay pretty small as well, if the syntax of your new constructs are chosen wisely. (I foresee control structures as prefix operators, which will be interesting to say the least.)
The evaluator and transpiler would complicate themselves significantly when adding control structures to the language: I think the most code-expanding task would be handling of not-directly evaluated code blocks. In the case of conditionals, you can probably get away with the ?: operator, also seeing as JS has the comma operator. But in the case of loops, you need a conpletely different structure in the output; it would depend on the implementation whether that stays small or not.
Of course, adding control structures is only useful if there's some way of defining names, be it variables (producing an imperative language) or recursion-capable functions (producing a functional language). It might be an interesting exercise to see how far you can push this language while still keeping the compiler under, say, 100 lines. :P
Protip if you do that: ditch the evaluator. A compiler doesn't need an interpreter to compile. ;)
When I did my degree, the older students were forbidden to use any language from the Lisp, ML or Prolog families for the compiler design assignment as that would be too easy.
As Java was just released by the time my class got to do the same assignment, we ended up using Java with JavaCC and JJTree.
While not as easy as Lisp, ML or Prolog, it was still much easier than the Jurassic yacc/lex that still prevails in some circles.
Something about that rule seems extremely backwards to me. Might as well implement a Prolog or Scheme in a lower level language then do your compiler assignments in that.
Reminds me of this quote "Any sufficiently complicated C or Fortran program contains an ad-hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp."
You’d be surprised how much hand written recursive descent is dominant in industry, and to be honest, I think it is often easier than using generator magic.
Everyone who is into compilers and parsing should try recursive descent at least once.
People don't use parser generator tools because you wind up doing more work than needed to integrate it with the rest of your compiler. The reason to use a parser generator is not for ease of use or rapid development (because they are harder to use and slower to develop with) but for performance and reliability.
The first one isn't such a big deal for most programming languages, the second one is. Throw in better error handling and tricks like sub-file incremental and/or heuristical compilation (for IDEs), and the generators don't make much sense anymore.
> People don't use parser generator tools because you wind up doing more work than needed to integrate it with the rest of your compiler.
I don't think you can make such a sweeping generalisation. Both JavaCC and Antlr have saved me a lot of time when doing DSLs in the past and both are well documented and easy to use. It would have been much more work for me to write the parsers by hand in Java. Both could also generate my own hand-rolled AST easily and so I do not see why integration would be a problem for most projects. I agree that generators do not scale to large compilers or projects with special needs, but that is not most projects.
My last few languages have all been prototyped by hand, but they were all live languages (with continuous re-parsing and interpretation), so I didn’t have much choice :). I havent bothered with ANTLR since grad school.
I think I'd have been confused as to why using a parser generator was also not similarly regarded as being too easy :)
Interestingly, most mature industrial languages (e.g. Java, GCC, clang) use hand-coded recursive descent parsers. And of course such parsers can be implemented very easily from first principles using parser combinators in FP.
I've used both JavaCC and Antlr in the past for small DSLs, both are very good tools, when working with Java.
Not down in the lower levels of IR and IL, the trees have mostly disappeared by then. If you are just interpreting or doing unoptimized compiling, you can get by with just trees.
Yes, but I don't think they require a low-level IR in that case (purity, no re-assignment, no loops). LLVM does not use trees at that level, for example, because they really need to use stuff like SSA, basic blocks, and so on.
I thought it would be good to include the output of the example executions at the bottom of the page... then I remembered i could just paste that in a console - very cool!
A while back I started making a game in javascript called "BASIC Instincts" that was going to be about finding clues in the world and typing in BASIC programs from magazines to solve puzzles. The parser/interpreter was the most fun part (https://www.youtube.com/watch?v=GwBiJR_rj_w)
I did the same thing for a conference talk once, but also “golfed” each of the 4 components down to Tweet size (back in my day, Tweets were 140 characters): https://gist.github.com/tlrobinson/1257254
Every programmer should implement a simple Lisp from scratch at least once, just to understand compilers and interpreters aren’t magic.
This is really cool, thanks for posting this. I've always been vaguely curious how compilers work, but have never set aside time to research it. This was a great introduction, I have a better beginner's notion of how they work now :)
The interpreter is certainly an interpreter, but a transpiler is arguably also a compiler, be it maybe with an easier target than most conventional compilers. Assembly language and machine code are also languages you can program in; in that sense you'd also have to call gcc a transpiler ;)
Indeed, every compiler is a transpiler: it transpiles into asm. But not every transpiler is a compiler in the truest sense of the term. IMO only transpiling to machine language qualifies as compiling.
But we're just arguing about semantics here :) My comment was more directed towards the interpreter part of things.
It would help squelch the confusion for future forum bike shedding.
Someone seems to try and point this out every time it comes up like they're winning some sort of pedantry points, but I don't see how they could be. Please illuminate me so I understand the next time someone says this and possibly join la resistencia.
It doesn't use the built-in eval(). There's a local function in the code named "eval" that executes the operation -- not the best choice of names, really.
The syntax is of course very limited with a one-line lexer, so more sophistocation here would somewhat expand the lexer, even though if it remains centered around regexes (which it probably will), it can remain pretty small.
When expanding the language, the tree building code can probably stay pretty small as well, if the syntax of your new constructs are chosen wisely. (I foresee control structures as prefix operators, which will be interesting to say the least.)
The evaluator and transpiler would complicate themselves significantly when adding control structures to the language: I think the most code-expanding task would be handling of not-directly evaluated code blocks. In the case of conditionals, you can probably get away with the ?: operator, also seeing as JS has the comma operator. But in the case of loops, you need a conpletely different structure in the output; it would depend on the implementation whether that stays small or not.
Of course, adding control structures is only useful if there's some way of defining names, be it variables (producing an imperative language) or recursion-capable functions (producing a functional language). It might be an interesting exercise to see how far you can push this language while still keeping the compiler under, say, 100 lines. :P
Protip if you do that: ditch the evaluator. A compiler doesn't need an interpreter to compile. ;)