Hacker News new | past | comments | ask | show | jobs | submit login
Compiler Design in C (holub.com)
109 points by adamnemecek on Nov 15, 2014 | hide | past | favorite | 43 comments



If you are interested in a printed version of a great beginner's compiler book, I can highly recommend Andrew W. Appel's "Modern Compiler in C". There are also versions of the book written for SML or Java. I have read and partially implemented the C version and I really enjoyed the book, basic enough to follow yet full-featured enough to be useful.

The book starts with parsing (I prefer PEG or Pratt parsers for their simplicity (and tool independence) to be honest and skipped some of that chapter) but then goes into semantic analysis, type checking, code generation, optimization passes, even mentions basic type inference.

There is some code online at http://www.cs.princeton.edu/~appel/modern/c/ .


Great book, though I recommend the SML version over C or Java for a clearer illustration of the concepts in code. Viewed side by side, the SML code looks concise and obvious vs. the long and wordy C/Java code. It just seems to lend itself better to the task, in that some of the pattern matching burden is already handled by SML.


Totally agree, ML-style languages are great for writing compilers.


This book appears to be more of a "compiler-compiler design in C"; it goes through how to write a lexer and parser generator, then writes a compiler using them, and I think the resulting compiler is a bit of a letdown: it does not much more than translate C into a linearised subset of C, and so the IMHO more "interesting" and important parts of the back-end like register allocation and instruction selection are completely absent.

It's a good if somewhat outdated book if you're interested mostly in parsing and lexing, but for all the claims it makes in the preface about being practical instead of theoretical and all the source code presented throughout, I found the lack of actual Asm code generation (or any mentions of this compiler being able to compile itself) disappointing.

Parser/lexer generators also seem to have fallen out of favour for the creation of actual compilers, both big and small, at least for C-like languages; techniques based on recursive-descent (RD) are quite popular now.

On the "big, production-quality" side, gcc used a generated parser but moved to a handwritten RD-based one, and Clang always used RD. EDG's front-end, used in Intel's and other commercial compilers, is also handwritten RD. On the small "toy compiler"/hobbyist/experimentation side, there's TCC/OTCC, CC500 ( https://news.ycombinator.com/item?id=8576068 ), C4 ( https://news.ycombinator.com/item?id=8558822 ), SubC ( http://www.t3x.org/subc/ ), and many others, all based on RD parsers.

In fact I can't think of any C compilers at the moment that are using a generated parser/lexer...


"...for all the claims it makes in the preface about being practical instead of theoretical..."

It's important to view this book in the context of its time. At that time, there were no books that showed you both the theory and complete running code for even this much. The compiler books (Dragon, 1st ed.; etc.) showed toy snippets but not a full lexer, parser, and code generator. There were only the articles in Dr. Dobb's about Small-C to show the way. So, at the time, Holub's book was a godsend. It married theory and implementation in a way that simply had not been done before. Hanson's and Fraser's 1995 book on a retargetable C compiler was a similar milestone, although it came out several years later.

After those two landmark books, explanations of full compiler implementations were no longer a rarity.


This underscores what David Conroy used to say when writing the MWC compiler: Yacc makes the hard part harder and the easy part easier.

And I agree for all its thoroughly-written tutorial approach, I don't find myself going back to it very often.


Care to expose any good book explaining more modern approach for compilers from the practical point of view (including RD)?


Practical Compiler Construction: http://www.t3x.org/reload/index.html

This explains the SubC compiler I mentioned in the original post.


Great book! I have spent hours reading it multiple times soaking all information I could.

It was one of the first books about compiler design that I got hold of, back when Internet access was only available at the university.


I had this book at University. It made a good companion text to our main textbook, the dragon book. The dragon book covers a lot of ground, and is very interesting, but Holub's book is much more practical, with everything illustrated with real code examples (if my memory is correct). This was a big help when learning and made compilers a lot more fun. It's well worth reading.

It's interesting that most of the content is still just as relevant today. Coding styles change over time, and definitely the popularity of languages has changed, but the theory is still just as useful.

Off topic, but we had the 'International' edition of this book. Many publishers at the time did this, I'm not sure why. The explanation was that it was to make it more affordable for us, although the prices were still high. They were soft cover editions with plain covers (this book had a red cover with only the title and author, in white). The linked page is the first time I've seen the real cover. This is probably an early example of regional pricing, much like DVD region codes. They could charge more in USA and less in other territories I guess.


Mine was also the international edition, with dark blue cover. Not having been a CS major I bought a copy out of curiosity on how a compiler is made, since it seemed a lot more approachable than the famous dragon book. I admit I was a bit overwhelmed at the amount of code it took to build a compiler. Glad to see the book again in its entirety.


His book the C Companion [1]

Really helped me get a deep understanding of C. Sadly, I haven't touched C for such a long that I've forgotten most of it. I highly recommended it for anyone who has a basic knowledge of C and wants to get deeper.

[1] http://www.amazon.com/C-Companion-Allen-I-Holub/dp/013109786...


Can somebody recommend a book that covers recent techniques implemented in LLVM?


Much of the frontend is pretty much the same: parsing, semantic analysis, type checking. LLLVM mostly comes into play when generating the code. Instead of generating assembbler or opcodes directly, you call into LLVM to create data structures that represent low-level code and then a final call to gen machine code or JIT.

There are several versions of the LLVM Kaleidoscope language tutorial where you build a compiler that supports numeric operations, it is often based on Pratt parsers (at least the official C++ version[1]), but there's a C version on github [2].

In my opinion, you could get pretty far by reading the "Compiler design in C" or "Modern compiler implementation in C" up to the point of optimization and then applying the code generation from kaleidoscope tutorials. LLVM takes care of many optimizations.

[1] http://llvm.org/docs/tutorial/LangImpl1.html [2] https://github.com/benbjohnson/llvm-c-kaleidoscope


The good intro to compilers is Engineering a Compiler. It focuse more on optimization then most intro books, and its all SSA.

If you want to go all in on SSA optimization, there is a book called Static Single Assignment Book and its written by a hole list of compiler writers. Its not finished but there is still a lot of information.

You can find it here: http://ssabook.gforge.inria.fr/latest/book.pdf

Or you can go with the classic, Advanced Compiler Design & Implementation. See here, http://www.amazon.com/Advanced-Compiler-Design-Implementatio...

All of them will teach you a lot about LLVM.


Speaking of SSA and optimizations, I just remembered that Andy Wingo (core contributor to Guile scheme) has a treasure trove of great articles, e.g. a fun intro to SSA [1] or CPS as used in Guile [2]. His blog is totally worth just browsing around.

[1] http://wingolog.org/archives/2011/07/12/static-single-assign...

[2] http://wingolog.org/archives/2014/01/12/a-continuation-passi...


Yes. His posts are a great. He is working on a VM and I do to so I follow his work closly.


Interesting that on page 6 of this book, it talks about the same idea as LLVM (as I understand it):

"There are advantages and disadvantages to an intermediate-language approach to compiler writing[...] Intermediate languages give you flexibility as well. A single lexical-analyzer/parser front end can be used to generate code for several different machines by providing different back ends that translate a common intermediate language to a machine-specific assembly language. Conversely, you can write several front ends that parse several different high-level languages, but which all output the same intermediate language. This way, compilers for several languages can share a single optimizer and back end."


How I love these 90s style book covers, color palettes :)


I agree, there were more interesting/amusing cover illustrations than the abstract, generic ones on the bulk of compiler and computing books today. That's how the Dragon Book got its name too.

Related article: http://www.globalnerdy.com/2007/09/14/reimagining-programmin...


I was (and still am) deeply disappointed when I got my copy of the dragon book and there's no dragon on the cover. That is blasphemy. What next? SICP with no wizards?


Also the Dinosaur Book[1]. I don't think Silberschatz et al. could have pulled this off if they were starting today.

[1] http://galvin.info/2007/03/13/history-of-the-operating-syste...


Speaking about covers, Forth Programmer's Handbook comes to mind :)

http://www.amazon.com/Forth-Programmers-Handbook-3rd-Edition...



Why would you redesign SICP and neglect having a wizard


I think that these days, if you're writing a compiler, it would be better done in one of * c++ to leverage llvm (afaik c++ is the best way to do that) and lots of existing code * some sort of ml, because ml is really good for lots of the things compilers need to do * self-host, because the compiler may be the best large-program workout your fledgeling language is going to get


In the same vein, see "lcc, A Retargetable Compiler for ANSI C": https://sites.google.com/site/lccretargetablecompiler/


What real world experience does Allen Holub have with compiler design? As far as I can tell, he has not contributed to gcc, LLVM,or written any toy compilers like TCC.


why use c? its a horrific, logically incongruent lang


Lowest Reasonable Common Denominator


Maybe for fun? To learn something?


It was 1990. There were not many options. The standard language for technical books then was Pascal. C was a huge improvement over it. In addition, Holub was the C columnist for Dr. Dobb's.


I won't echo this sentiment, but; why use C for writing a compiler? It seems that languages in the ML family, among others, are nicer for that kind of domain.

I guess it at least has its uses if you want your compiler to be really fast.


If you want to have fun exploring how compilers work, in your spare time after a hard day working in C++, what better way than to reimplement a language you know?

On the other hand, my projects in this vein are C++ projects but they implement functional languages or Scheme. I have implemented one parser that creates a syntax tree from a simple equational language, and several interpretive backends, such as SKI-combinator-based graph reduction and a TIM-based interpreter. I do care about the speed of the compilers I use for production code, but not for this work. The Scheme compiler I wrote was written to run on my own Scheme interpreter which was written in C++.

The reason I think most hobby compilers I know of don't go the way I've gone is that it takes tons and tons of research to find out how functional languages are implemented, and the Appel books are a bit daunting, too.


The book was published in 1990. I worked on a commercial compiler in 1990, and many of our customers ran it under 16-bit MS-DOS.


I know what you mean, man. Lots of n00bz, ricers, and fanboiz will claim C is the fastest, most efficent lang because you can get "close to the metal" and "tweak". In reality, any code beyond trivial complexity will benefit much more greatly from algebraic rectification, which can only be done with certain languages that are amenable to formal analysis.


> In reality, any code beyond trivial complexity will benefit much more greatly from algebraic rectification, which can only be done with certain languages that are amenable to formal analysis

What exactly is "algebraic rectification"?

While it is generally true that having a formal semantics aids greatly in analysis, it is worth noting that a very large amount of program analysis work is targetted towards C. (And mind you, flexible, high level languages bring with it their own troubles. Analysis in the presence of higher order functions is not a panacea at all)


Until I see an award-winning 4k/64k demo written in one of these ultra-high-level languages, I stand by my opinion that C is more efficient.


Check your facts. A large majority of award-winning 4k/64k are written in C++.


his point is still valid though



I think what flatestcat wants is something that actually functions and is proven to function correctly. Frama-C is yet another open source project that is long on hype but woefully short on delivery of actual, working code.


Replying to oneself using a bunch of different accounts is an abuse of this site.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: