Hacker News new | past | comments | ask | show | jobs | submit login
C subset lexer, parser, code generator, and VM in 291 lines of C (umontreal.ca)
63 points by jdp on Jan 4, 2010 | hide | past | favorite | 7 comments



Marc Feeley is a pretty cool guy: in addition to developing the full Scheme system Gambit-C (http://dynamo.iro.umontreal.ca/~gambit/wiki/index.php/Main_P...), with others he's done a variety of smaller useful systems, some of which you can find in the Gambit-C Wiki's Dumping Grounds http://dynamo.iro.umontreal.ca/~gambit/wiki/index.php/Dumpin...:

Jss: JavaScriptScheme: a multithreaded Scheme to JavaScript compiler

PICOBIT: Very compact Scheme compiler and virtual machine suitable for microcontrollers

While there are of course many Scheme to Javascript systems (easy because Javascript is mostly "Scheme with an ALGOL face") I know the latter has been used in some serious projects.


Scheme->JavaScript isn't that easy, because of tail calls. But Jss handles it.


I've always thought it was standard practice to convert to non-left recursive form (when it will be implemented as a recursive descent parser, as it is here). I would have thought, that for pedagogical purposes, this would be of special importance. Consider the sum() function - why it always begins with calling term() is not clear from the grammar given (because they don't show the transformation):

    <sum> ::= <term> | <sum> "+" <term> | <sum> "-" <term>
However, I have to admit that stating the grammar as they have is much clearer, in terms of conveying what the grammar is (as opposed to implementing it). I'm implementing grammars quite often at the moment, so this is food for thought.


I think this kind of article is really interesting, because it gives us new ways of understanding the basics of complex technologies. The typical compilers course is designed to teach how huge monsters are developed (that is why we have a dragon book). It is interesting to see the minimum such monster and how we can improve on that.


I think it is only fair to mention that this is Tiny C, not full C.


That is in the title :)


But to be fair, this is a really tiny subset. It's really more correct to call it an augmented infix expression parser. There's no symbol table -- there are 26 fixed variables named 'a' through 'z', and no arrays/pointers nor other way of doing indirection, so 26 values is all you get. And there are no functions, so no recursion. So it's not even turing complete.

It is, however, a pretty neat hack if you're interested in the essence of how a parser and code generator works. This kind of thing isn't nearly as complicated as a lot of people assume, and this is a fun existence proof of that fact.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: