An Intro to Compilers

simonebrunozzi · on Aug 14, 2017

Ha, sweet memories.

I am 40 now; back in 2004-2006, for two years I was one of the youngest professors in Italy, teaching "compilers and programming languages".

I still feel so fortunate to have had that experience.

To help my students save the relatively huge amount of money to buy the dragon book(s), I created a condensed version of the parts that were required for the course - and I didn't charge anything for it, unlike what usually happens pretty much everywhere in Italy. They were available in PDF and OpenOffice formats, on the website that I created for the course (yes, the CS department didn't really have a proper website to use as CMS - I kid you not).

You can find the material here, it's in Italian but it might be fun to take a quick look: http://www.lulu.com/shop/simone-brunozzi/dispense-lab-lingua...

Such good memories.

gnuvince · on Aug 14, 2017

I was T.A. for COMP-520, Compiler Design, at McGill University for two years, and it was the best job I've ever had. The compiler project had not been updated in 10-12 years; my adviser trusted me to make a new, interesting project. I was paid by the department for 5 hours of work per week, but I poured in more than 35 per week to design, document, and implement a subset of Go

My hard work paid off immediately. Students were more interested by a language that was new, modern, and used in the industry than by the previous language that was used, Wig. As a result, enrollment jumped from 12-14 students to 45. Though it was a subset, and much of what makes Go interesting was removed (concurrency, interfaces, GC) it was still an extremely demanding project. Many times I thought that the project was too big, that we needed to cut back, but the students were real troopers and chugged through (by the way, when did 20-year olds become so smart?! I was never this bright when I was their age!) and all teams completed the project and made me very proud.

You can see the page for the first year of the class, along with relevant documents at http://www.cs.mcgill.ca/~cs520/2015/

negativ0 · on Aug 14, 2017

Do you have a link to download it without registering to lulu.com ?

simonebrunozzi · on Aug 14, 2017

Oh, sure, I just put the most relevant stuff (in PDF and SXW) in a single zip file here: https://www.dropbox.com/s/medbtnjbd4lgoeb/lab-compilatori_br...

atas · on Aug 14, 2017

I did the same for Tanenbaum's operating systems book and for a post-secondary education course (around the same time and around the same age).

simonebrunozzi · on Aug 14, 2017

Fantastic! How was your experience overall? Did you do it in the US?

atas · on Aug 15, 2017

Greece. I learned a lot in the process having just an undergraduate degree at the time. The students didn't appreciate it much unfortunately (wasn't a university but a vocational school; don't ask why Tanenbaum was on the curriculum). Just one came to me after the exam (I had to pass them all anyway) and congratulated me of making it so hard for them (him).

agumonkey · on Aug 14, 2017

There are two Aho Ullman books at my library, dense but amazing books.

ps: how did students do on this class ? too hard but loved ?

simonebrunozzi · on Aug 14, 2017

They did decently well. I wasn't too hard on the grades, but I wasn't easy either.

agumonkey · on Aug 14, 2017

Do you have a public website for the course so I can see the problems ? :)

simonebrunozzi · on Aug 14, 2017

Ah, it was online until about 2007 or 2008. You might be able to find it with the wayback machine (it was something like www.wedoit.us/labcompilatori). I might have it in my old backups, I will check.

agumonkey · on Aug 14, 2017

It is indeed archived, up until 2010: https://web.archive.org/web/20080716040254/http://www.wedoit...

Rodd45 · on Aug 14, 2017

This is great Simone, thanks!

bitL · on Aug 13, 2017

BTW, can anyone please suggest a good online compiler course for creating your own programming language with labs/exams/certs? I can find pretty good courses for almost anything but not for compilers... Something that would focus on handling grammar (LALR, LL, CYK etc), ASTs, semantics, types, code-completion, imperative/OOP/functional/logical/self-modifying constructs etc. or even NLP-to-AST conversion...

jpolitz · on Aug 14, 2017

My compilers course has all the lectures recorded, and has starter code for all the assignments provided. It provides all the parsers, and focuses on the backend, including memory management and various details of value representations.

This is just my materials, I don't have an associated platform with verification/certification. However, some folks from HN have previously done the course and emailed me after, and seem to have gotten some value out of it.

https://cseweb.ucsd.edu/classes/sp17/cse131-a/

https://podcast.ucsd.edu/podcasts/default.aspx?PodcastId=401...

https://github.com/ucsd-cse131-sp17

Earlier offering:

https://www.cs.swarthmore.edu/~jpolitz/cs75/s16/

gilmi · on Aug 14, 2017

I'm using the earlier offering of your course and another[0] to learn about compilers for a while now and it's the only method that really clicked for me. Thank you very much for making this open!

[0]: http://www.ccs.neu.edu/course/cs4410

jpolitz · on Aug 14, 2017

Ben Lerner did a great job writing notes [0] for his version, which are super helpful if you prefer reading to videos.

[0] https://course.ccs.neu.edu/cs4410/#%28part._lectures%29

gilmi · on Aug 14, 2017

Definitely! They are really well done and actually worked better for me than videos. Unfortunately they only cover the first half of the course.

rmusial · on Aug 14, 2017

Thanks for this, this looks amazing

abiox · on Aug 14, 2017

i started watching the first lecture; when you ask about the output of ghc, i love that someone says "burritos".

thanks for making this material available!

kerkeslager · on Aug 13, 2017

Matt Might doesn't have an organized course for free online, but I found that his blog filled in a lot of the gaps for me between compiling a language like C, and a more modern language like Scheme or Python, which are missing from other resources I've found online. You'll have to click around to different posts, but here's a good starting point:

http://matt.might.net/articles/compiling-scheme-to-c/

DonaldPShimoda · on Aug 14, 2017

Well... he kind of does, actually!

http://matt.might.net/teaching/compilers/spring-2015/

kerkeslager · on Aug 15, 2017

Awesome! Thanks for that.

piemonkey · on Aug 13, 2017

Alex Aiken has a very nice course: http://openclassroom.stanford.edu/MainFolder/CoursePage.php?...

sn9 · on Aug 14, 2017

You can find this on Stanford Lagunita in a MOOC format.

dochtman · on Aug 14, 2017

I liked the Stanford course given by Aiken, on Coursera:

https://dirkjan.ochtman.nl/writing/2012/07/21/compilers-on-c...

Also, if you like this sort of thing, my compiler in Python that compiles to LLVM IR might be interesting:

https://github.com/djc/runa

dipper · on Aug 14, 2017

Most of this stuff flew over my head. While I understand C, this stiff feels out of grasp. Can anybody also please suggest recommended reading for beginners?

phenylene · on Aug 14, 2017

Crafting Interpreters is a great read:

http://www.craftinginterpreters.com/contents.html

dipper · on Aug 14, 2017

This looks great, thanks!

pbiggar · on Aug 14, 2017

Pet peeve about compiler education that is not repeated by this article (yay, well done!): the way you have to wade through months of parsing bullshit to get to the really interesting stuff (optimization and static analysis).

DannyBee · on Aug 14, 2017

"the way you have to wade through months of parsing bullshit to get to the really interesting stuff (optimization and static analysis)."

If you look back in the late 70's, you'll discover something interesting: Almost everything was parsing of one form or another.

Really. Parsing of flowgraphs, etc. Dataflow was done not just by bitvector methods (which were fairly new), but by parsing of graphs using graph grammars.

Most optimization analysis are very closely related to parsing (and can often be solved by graph reachability). The complexity bounds of most of them are also driven by parsing complexity results.

For example, andersen-style points-to analysis is solvable as a context-free-language problem. The time bounds of it are driven by the time bounds on context-free-language parsing. In particular, all-pairs reachability on Context Free Languages (N^3).

etc Parsing is the basis of compilers for a very good reason. The fact that we have advanced other types of methods on top of analysis for optimization and analysis does not change the fact that, at the core, a lot of them are based on equivalence to parsing of graphs and graph reachability.

Another way to look at it: If you ever want to be able to be really good at compiler optimization, don't skip parsing. Otherwise you will miss the connections that are really important to being able to design efficient new algorithms, instead of just learning old ones.

nostrademons · on Aug 14, 2017

Parsing also comes up in a stupidly large number of other areas within programming, including: serialization formats, network protocols, DSLs, log analysis, time series interpretation, NLP, unstructured data mining, and probably several I'm not familiar with. Fundamentally understanding what's going on when you're doing recursive-descent, shift/reduce, or combinator-based parsing unlocks a big toolbox that you can use to solve a number of other problems.

pbiggar · on Aug 14, 2017

I'm not saying it isn't useful - I'm saying it's not interesting (and I'm talking about the act of parsing, not the algorithms as applied elsewhere). Think about this: you start learning compilers, you spend months on parsing, and after that what do you have? Nothing - you "compiled" stuff into an AST. Yawn.

DannyBee · on Aug 14, 2017

"I'm not saying it isn't useful - I'm saying it's not interesting "

Then you are doing it wrong :)

"(and I'm talking about the act of parsing, not the algorithms as applied elsewhere)"

I'm not sure what this is supposed to mean. It sounds like you are trying to separate parsing text into ASTs and parsing flowgraphs, so you can claim one is boring, and the other is "algorithms as applied elsewhere". There is no such separation. We just happen to name variables a little differently so it doesn't quite look like "figure out if this one parentheses matches this other one". But it is the same algorithm. If you are trying to separate parsing as the thing that happens that makes an AST, that is not the only thing that is parsing. Yes, some books colloquially separate phases like that, but it's silly.

"Think about this: you start learning compilers, you spend months on parsing, and after that what do you have? Nothing - you "compiled" stuff into an AST. "

You are definitely doing it wrong. The same parsing of a grammar you taught someone to make an AST, you can now say "we can apply a slightly different language grammar to the AST and generate machine code", or "apply a slightly different language grammar to the AST and parse out flowgraphs and statements"

etc parsing isn't just take a bunch of text, turn it into an AST. The fact that a ton of people think this is one reason there are so many crappy little compilers that never go anywhere :)

pbiggar · on Aug 14, 2017

We're having two different conversations here. You are saying (correct me if I'm wrong) that the algorithms you use for parsing are broadly applicable to program analysis. You are not wrong. It's just orthogonal to my point, which is that program analysis is really interesting.

To make an analogy here (from Lockhart's lament: https://www.maa.org/external_archive/devlin/LockhartsLament....), you are saying that color theory is massively important throughout one's career as a artissti. I am saying that I just like painting things.

Sure, there's lots of cool shit in parsing theory no doubt, with tons of broad application. But the way that we are taught it ("here's how we get an AST from written text") turns away people who are into program analysis. As such, we shouldn't consider parsing a pre-requisite for program analysis, regardless of how interesting or applicable the theory of parsing is.

nostrademons · on Aug 14, 2017

I think DannyBee's using a slightly broader definition of parsing than you're used to. There's the narrow definition of parsing ("Here's how we go from a sequence of bytes to an AST") that's the concrete material taught in compiler courses. But there's also a much broader paradigm where "parsing" is broadened to "Here's how we go from a listed set of patterns, possibly overlapping, to a set of transformations defined by those patterns." The source input need not be a linear set of bytes; you can trivially extend it to trees by taking an in-order traversal of the tree, or even operate directly on tree or graph data structures.

With the broader definition, a large number of what you consider the interesting parts of program analysis are actually parsing problems. Instruction selection, for example, involves tiling the AST (or more typically, the IR) with a minimal set of machine instructions. In many compilers, this is implemented with a machine-definition file that describes a grammar for transforming the IR into machine-language instructions. Peephole optimization involves recognizing certain patterns (eg. division by a constant power of 2) and then transforming them into more efficient constructs (eg. shift-right). This can also be described by a grammar on the IR.

I'm not sure if this is explicitly taught in compiler courses - it wasn't really in mine, but it was alluded to - but it's a really powerful way of thinking of the passes in a compiler. It lets you unify concepts, so that instead of memorizing all the specific algorithms used, you can think of a pass as "this is a transformation from this specific input language to this specific output language".

gizmo686 · on Aug 14, 2017

As much as I enjoyed optimization and static analysis, parsing is probably the single most useful knowledge I got from my formal CS education; and probably the only part of a compiler that most CS students will ever need to know.

joe_the_user · on Aug 14, 2017

Yes, I think understanding that there's a fundamental difference between the operation of parsing a language and "naive call-and-for-loop programming" is very important. Most naive efforts to parse a language starting with no knowledge simply fail (and that can include, say, things efforts to validate a file name in a somewhat complex file system or to parse a packet in a network protocol).

In college, it took me a bit of time to understand what a formal language really was and what the difference between a parser and a simple string processor was and when I arrived in industry, I notice this were something a certain proportion of otherwise experienced programmers simply did not understand this.

olewhalehunter · on Aug 14, 2017

The symbiotic relationship between compiler development and hardware architecture design has a defining role in CS problems, and while you don't need to understand that relationship to write CRUD apps for a living, I think ignorance of it robs those who really want to understand this stuff of truths both technical and profound.

yeowMeng · on Aug 14, 2017

Totally!

Compiler development = Finite State Automaton + Push Down Automaton + Context Free Grammar + Intermediate representation + Static Analysis/Optimiation + Stack Machine + ...

Hardware Architecture = Stack Machine/Turing Machine + electricity + creativity

Stack Machine == Turing Machine

Compiler development starts and ends with Turing Machines.

Hardware architecture ends with Turing Machines operating at the speed of electricity through metal.

olewhalehunter · on Aug 14, 2017

There are alternatives to this mainstream style of computer architecture, such as dataflow or hybrid quantum-classical systems, which both require a very different kind of compiler.

https://en.wikipedia.org/wiki/Dataflow_architecture

userbinator · on Aug 14, 2017

A recursive-descent parser is probably the only part of a compiler which people will easily derive on their own, even without taking a compiler course.

anarazel · on Aug 14, 2017

Funnily enough it's also where a bunch of mature compilers eventually ended up with, because of speed, heuristics and so forth. Like GCC, which used to have bison based compilers IIRC, but but is hand rolled rec dec...

vidarh · on Aug 14, 2017

It's not so much for speed and heuristics but for error reporting and/or recovery.

As it turns out (and I've tried many times myself too), designing a parser generator that will retain a compact representation without losing good error reporting is very hard.

E.g. consider a BNF type grammar. Almost every symbol represents a potential error. But often deriving that error message automatically in a way that is helpful to a human is incredibly hard because the grammar lacks information about which mistakes are more likely for humans to make.

So we end up with bad error reporting, or you end up having to write code to analyse errors anyway, and then a lot of the benefits disappear - the actual parsing is usually easy to do compactly with a recursive descent parser and a tiny library of helpers anyway.

anarazel · on Aug 14, 2017

Yea, I was imprecisely using heuristics to primarily be about error reporting. You essentially have to partially accept a wider set of grammar, to be able to throw more intelligent errors.

But speed ime also is a major issue. GCC sped up a good bit when switching, and we are seeing speed issues with bison generated parsers.

Ironically one of the benefits of bison is that you get feedback about whether your grammar is conflict free. Can be a significant advantage over a hand built recdec parser for a complex language like SQL.

vidarh · on Aug 15, 2017

Speed certainly can be a nice bonus, but you can also generate recursive descent parsers fairly easily if you have a suitable representation. My own parser generator attempts did that.

The feedback certainly is useful, and I'm all for people using such tools to validate grammars for testing. Though people manage to abuse that by writing extra code too (Ruby being a prime example of using a parser generator and then abusing the hell out of it in ways that complicates the grammar).

I really wish more work would go into research on automating error reporting, though. As much as I handwrite most of my parsers now, I wish I didn't have a reason to.

anarazel · on Aug 15, 2017

> I really wish more work would go into research on automating error reporting, though. As much as I handwrite most of my parsers now, I wish I didn't have a reason to.

Indeed.

_d8fd · on Aug 14, 2017

I recently made myself sit down and learn leex (essentially lexx for Erlang), and I found that to be an extremely useful exercise. It helped me understand how to think about what kind of inputs a parser can make use of, and when it makes sense to skip the tokenizing step.

bollu · on Aug 14, 2017

I've been writing an llvm-lite in haskell to show of the main optimisation ideas of LLVM [here](https://github.com/bollu/tiny-optimising-compiler). Thought someone maybe interested :)

userbinator · on Aug 14, 2017

I know it might be a contrived case, but it's rather disappointing to see that even with the -O2 optimisation level, the final generated code contains 43% (3/7) useless instructions to mess around with RBP. I mean, it should be bleeding obvious to the compiler that this function doesn't require any stack allocation, much less an RBP-based one (the tradeoff between using RBP/RSP-based allocation is far more subtle, and I can forgive a compiler for choosing nonoptimally in that case), so why did it emit those useless instructions?

There's plenty of articles about how intelligent compiler optimisers can be, and I don't doubt that they can do some pretty advanced transformations, having read a lot of compiler output; and then, I see things like this, which just make you go why!?!? It's puzzling how compilers behave, applying very sophisticated optimisation techniques but then missing the easy cases. The effect might not be as great as more advanced optimisations, but so is the effort required to do this type of optimisation. This is not even low-hanging fruit, it's sitting on the ground.

However, the compiler will use as few registers as possible.

Compilers usually try to use all the registers they can (although sometimes they do exhibit odd non-optimal behaviour, as described above); eschewing register use and using memory instead will increase code size and time.

brigade · on Aug 14, 2017

The "useless" instructions aren't for potential stack allocation, they're there so backtraces work. If you don't care about that then you can add -fomit-frame-pointer and they disappear.

Compilers don't default to this because having backtraces on crashes and profiling are more useful than the like 1% speed gain from another available register on x86-64.

userbinator · on Aug 14, 2017

Backtraces work without that.

http://yosefk.com/blog/getting-the-call-stack-without-a-fram...

brigade · on Aug 14, 2017

Are you arguing that they do work on ELF+DWARF with sufficient extra debug info, when running under a debugger, despite documentation that explicitly says it doesn't work? Because I certainly don't expect my users to have a debugger installed (let alone regularly run programs under one) but I'd still like to get useful crashlogs automatically.

Or are you saying that you could potentially reconstruct a backtrace, if someone wrote code to do so that is smart enough to locate and account for all the stack adjustments in reconstructing where each function stored its return address, including runtime alloca() / VLA adjustments and multiple adjustments in the same function (potentially needing to reconstruct the code flow), just from the disassembly? Because the linked MIPS code is nowhere near that smart, and to my and that author's knowledge, such code has not been written for x86.

Either way, nice in theory but not good enough in practice.

userbinator · on Aug 14, 2017

The debugging info is enough to find, for each code address, the offset to the return address on the stack, from which you can find that offset, etc. You don't need a debugger but saving the stack is enough to get a backtrace for a crashed process.

lultimouomo · on Aug 14, 2017

You can have a minified memory dump sent automatically, then symbolicate it on your servers using the stripped debugging symbols. That's what Breakpad does. I'm not sure how well it deals with alloca(), but I've used in on several shipped pieces of C++ software and it worked mostly fine.

DSrcl · on Aug 14, 2017

Compile with -fno-frame-pointer and you won't see those code. It's for debugger.

WalterBright · on Aug 14, 2017

One nice thing today is the source code to professional compilers is readily available. Pick one for a simpler language, and just read the code.

sigjuice · on Aug 14, 2017

Is there a particular language that you could recommend? Thanks!

blacksmythe · on Aug 14, 2017

Perhaps the one he wrote?

    Walter Bright is the creator and first implementer of the D programming language... http://www.walterbright.com/

WalterBright · on Aug 14, 2017

D is a fairly complex language. I'd start with a simpler one, like a Pascal or Javascript compiler.

For the latter, you can look at:

https://github.com/DigitalMars/DMDScript/tree/master/engine/...

b4ux1t3 · on Aug 14, 2017

Cheers to you for not just trying to force your own creation down people's throats. D is a cool language, but you're right, it's a bit complex for a beginner.

jnordwick · on Aug 14, 2017

I'd also say the Java compiler is fairly simple too, but it lack any real optimization passes.

WalterBright · on Aug 14, 2017

A Java compiler lacks optimization and code generation (generating IR is not code generation). So does Javascript. But they are both great languages for learning how lexing, parsing, and semantic analysis works because those operations are relatively simple.

C is a harder one because the multiple "phases of translation" add quite a bit of complexity. It's easy to get lost in the frustrating and pointless complexity of the preprocesser. C++ layers onto that mixing semantic analysis up with parsing, and an unusually complex semantic analysis. Not for beginners.

I suggest reading the source code to a professional compiler because there are books about how compilers work, and then there's how compilers really work. Pro compilers tend to be built differently from academic or side project compilers.

xoroshiro · on Aug 14, 2017

I don't understand why this still looks scary to me. A while back after learning regex (though only at a basic level) I thought, maybe I can make my own custom C preprocessor as an exercise. Perfect right? I get to choose the syntax AND the rules. But somehow I just can't write it. Nested ifdefs, defines, undefs mixing together too OP.

Reading through this, it seems like there's a huge amount of work into real compilers. Front end, optimizer, backend, etc. I do appreciate it more, but as useful as compilers are, maybe I'll just leave it to the pros (:

le-mark · on Aug 14, 2017

> maybe I'll just leave it to the pros (:

Don't ever do that! It really isn't that complicated at a basic level. Regex aren't the solution for parsing. What you're missing is the recursive nature of "context free languages"[1]. Formal language theory [2] was one of the big, early wins of Computer Science. There's a lot to it, but writing a simple recursive descent parser doesn't require any of it.

The big caveat is that parsing is only about 1/3 of the problem, the others as described in this article are optimization and code generation. And of these, optimization can be skipped entirely leaving only code generation, which can be naively done by walking the AST. If this all seems foriegn to you, there's a lot of really good info online nowadays ie people love writing about this subject.

Understanding this subject is very important in my opinion, this is one of the foundational things we do. Attaining a comfort level with compiler engineering is one the two or three things anyone can do to really "level up" as a software developer. Some other things are writing multi threaded servers in C, and 3d software rasterization. Again, my opinion here.

[1] https://stackoverflow.com/questions/559763/regular-vs-contex...

[2] https://en.wikipedia.org/wiki/Formal_language

xoroshiro · on Aug 14, 2017

>Understanding this subject is very important in my opinion, this is one of the foundational things we do. Attaining a comfort level with compiler engineering is one the two or three things anyone can do to really "level up" as a software developer.

I think it depends on the level of abstraction.

Then again, I'm not really a software developer, most of the work I do is scripting stuff for data analysis, which is where I learned regex from. I did learn C in college which I somehow got fascinated with, but I never really do any serious work with it. Still, over time, reading about quite lower level stuff (compared to what I do) does seem like it helps take the mystery out of things like unintuitive behaviors with multiple references to a Python list (I suspect it was pointers all along). Taking it to the next lower level with studying compilers and how assembly gets generated doesn't seem like it will benefit me much more.

Still, I do like C enough to possibly consider doing something in it for fun if I get an idea, so maybe one day I'll come back to this.

jtolmar · on Aug 14, 2017

Do you think you could write an interpreter?

If so, you can write an interpreter that "interprets" by spitting out a bunch of Java or Python (or whatever). That's a transpiler.

If you can write that, you just need to learn LLVM or x86 well enough to transpile to that.

bogomipz · on Aug 14, 2017

Wow I enjoyed both the content and the design aesthetic of this post. I guess that's what happens what a designer turns software engineer? Please post more like this!

fleshweasel · on Aug 14, 2017

I'm a little confused as it looks like an image in the article labels a string literal as being a comment. Am I misunderstanding something?

b4ux1t3 · on Aug 14, 2017

Note that it is underlined in yellow. The color of the underline is what denotes it as a literal, not the color of the text itself.

6nf · on Aug 14, 2017

Yea it's just a mistake, it should be string literal.

andreasgonewild · on Aug 14, 2017

Compiler design depends a lot on the language being compiled though, this would be a typical design for a C-style language.

When doing Forth-style languages for example; I've found that it's not very helpful to parse to trees, since I can't tell the structure of function calls before seeing the actual running stack anyway. Which in turn affects the compiler, since all it has to work with is a sequential stream of tokens.

And not all languages are stand-alone, or compile all the way down to machine code. Compiling to byte-code for running immediately, with the backup of a runtime library; is an attractive option for more specialized languages; and this means that it's possible to carry more information straight over from compilation to runtime.

I've been working on a scripting language that raised some of these questions lately:

https://github.com/andreas-gone-wild/snackis/blob/master/sna...

jacquesm · on Aug 13, 2017

If this stuff interests you, even though it is quite old you really should read the 'dragon book':

https://en.wikipedia.org/wiki/Compilers:_Principles,_Techniq...

tmccrmck · on Aug 13, 2017

We've rehashed the debate over the utility of the dragon book in the 21st century so many times on HN that I'm not going to comment on it's usefulness. (Search if interested)

But, for those interested, I highly recommend Engineering a Compiler by Cooper and Torczon OR Modern Compiler Implementation in ML by Appel. I've read all three and would only recommend the dragon book to someone very interested in the subject.

CalChris · on Aug 14, 2017

The Monica Lam sections in the second edition of the Dragon Book are out of this world good. Also, she's an excellent lecturer.

Muchnick's book is still quite good.

A second semester compiler class is rarely taught and so everyone tries to squeeze too much into one semester. I almost wish they broke it in two and allowed students to take either half. LL(1) parsing isn't a prerequisite to static single assignment.

pbiggar · on Aug 14, 2017

+1 for Cooper/Torczon (as someone who has a PhD in compilers, and still hates the Dragon book).

timthorn · on Aug 14, 2017

Appel's book is also available in many other languages, if the thought of ML is offputting.

sigjuice · on Aug 14, 2017

Why would the thought of ML be off-putting?

timthorn · on Aug 14, 2017

If you're unfamiliar with it but want to learn about compilers, a text in a language you know will decrease the cognitive load.

kp25 · on Aug 14, 2017

I'm usually scared to look at these compilers titled articles, feeling I do not understand or learn anything from it.

This article is a very good intro to compilers and now I understood how these things work. Thanks for the article.

JustSomeNobody · on Aug 14, 2017

I love reading about compilers. It's fascinating. I have implemented my own PL/0 compiler in various languages as a way to learn those languages. It's fun without having to think about designing a new language.

Question: Does anyone know of some resources for adding concurrency to a simple stack based VM? I've been wanting to look into that for a while just for kicks.

So, basically, take something like the machine here: https://en.wikipedia.org/wiki/P-code_machine

and add concurrency to it.

Tehnix · on Aug 14, 2017

> Some compilers translate source code into another programming language. These compilers are called source-to-source translators or transpilers

I was wondering how compile-to-JS languages would be classified. Technically they are all transpilers, but seeing as there is no lower-level for the browser than JS (barring WebASM), one could also argue that they are in fact also compilers.

Anyone have any thoughts on this?

khedoros1 · on Aug 14, 2017

> Technically they are all transpilers, but seeing as there is no lower-level for the browser than JS (barring WebASM), one could also argue that they are in fact also compilers.

They're compilers because they translate from one computer language into another. It has nothing to do with how low of a level the browser supports running. They're additionally transpilers (a subset of compilers), because the languages that they translate between are both high-level languages.

colejohnson66 · on Aug 14, 2017

So all transpilers are compilers, but not all compilers are transpilers?

sitkack · on Aug 14, 2017

Correct. Transpilers are a subset of compilers that go from a HLL to a HLL.

mbrock · on Aug 14, 2017

Although the following (from Wikipedia) seems to accord with usage:

> The name compiler is primarily used for programs that translate source code from a high-level programming language to a lower level language (e.g., assembly language, object code, or machine code) to create an executable program.

Maybe we should complicate that a bit by saying that a compiler typically compiles a programming language into a grammar where the higher-level features of the source language must be "expanded" or "linearized" in some substantial way to be expressed in the target language.

So, for example, GHCJS is more like a compiler than a transpiler, even though its target language is JavaScript, because the JavaScript that it generates is full of graph reduction stuff that is implicit in the source Haskell.

On the other hand, CoffeeScript is more like a transpiler, because it doesn't do any such substantial expansion or linearization.

sitkack · on Aug 15, 2017

I like that.

Compilers lower input source into something more linear.

Transpilers move input source laterally.

They are different, which I dislike people saying they are not.

_d8fd · on Aug 14, 2017

pbiggar · on Aug 14, 2017

Transpiler is a recently made up word, and describes something that's just as well described as a compiler. I wouldn't spend too much time hoping for a distinction here.

chrisseaton · on Aug 14, 2017

> Transpiler is a recently made up word

People keep saying this, but it's not true. You can find the word being used, in its modern meaning, as far back as 1964.

https://academic.oup.com/comjnl/article/7/1/28/558689/The-co...

sigjuice · on Aug 14, 2017

Page 35 of this paper has this sentence.

One can also envisage a "transpiler" capable of converting a program from one such style into another with considerable generality.

Are there any other known citations of the term "transpiler"?

chrisseaton · on Aug 14, 2017

Yes, there is also this book from 1999, which talks about a Pascal to C 'transpiler' around as early as 1989.

https://books.google.co.uk/books?id=vKQSBwAAQBAJ&pg=PR8&lpg=...

sigjuice · on Aug 17, 2017

The actual author of p2c has the good taste to not call it a transpiler. https://schneider.ncifcrf.gov/p2c/historic/daves.index-2012J...

nostrademons · on Aug 14, 2017

I'd thought a transpiler is a compiler where the source and target language are the same and the semantics of the program are preserved across transpilation (eg. Closure Compiler, Babel, LLVM optimization passes). That's a useful distinction: it implies that you can feed the output of a transpiler back into the input and at worse the result will be idempotent, and that you can mix transpiled output with raw input and the result will still be interpretable.

Evidently Wikipedia has a more broad definition of source-to-source compilation, which I'd agree is pretty meaningless. The distinction between high- and low-level languages is pretty arbitrary; you can write machine-language "source code", so in some ways all compilers are source-to-source.

jlouis · on Aug 14, 2017

We could newspeak it to be an endo-compiler (endo- is greek for internal/within). It runs well with endo-functor.

bhrgunatha · on Aug 14, 2017

Compiler/transpiler - same difference.

I think they've gone out of fashion now, but when I learned about compilers we were introduced to T-diagrams, sometimes called tombstone or bratman diagrams and shown how to bootstrap a compiler for a high level language from scratch [1].

Torben Mogensen's Basics of Compiler Design [2] - Chapter 13 is devoted to bootstrapping a compiler, which introduces them.

[1] scratch meaning a computer (with relevant hardware) with a CPU with it's machine code instructions and some way to make files containing those instructions. You can do this even without assembler, that's the first step. Of course no-one would do this now, but it's a necessary step developing the first high level languages and compilers.

[2] http://www.diku.dk/hjemmesider/ansatte/torbenm/Basics/

codingbear · on Aug 14, 2017

Can anyone help me understand what kind of employment are available in the market which requires deep knowledge of compilers?

I love this subject, but I don't know to which domains it is applied on outside of compiling languages like Javascript/JAVA/D/C/Rust and so on on into Machine code.

147 · on Aug 15, 2017

Steve Yegge's rant opened my eyes to how useful knowledge of compilers can be: http://steve-yegge.blogspot.com/2007/06/rich-programmer-food...

yazilimci_adam · on Aug 14, 2017

Thanks. I liked this blog post. I've questions. Why should I prefer LLVM? What is LLVM and how it is different from GCC. I understand, every compiler has three parts. Front end, Optimizer and Back End. Looks simple.

legulere · on Aug 14, 2017

I haven't looked at the GCC code myself, but what I heard is that you can see that it's a codebase that grew massively over a large timespan and things that should be separated are not. At least a few years ago there were actually two different IR formats (the newer one being based on SSA) which are converted between each other several times. I think GCC's IR does not have a canonical text representation, this is really helpful for understanding and debugging in LLVM. Still also LLVM has its weird parts, like their variable naming scheme

CalChris · on Aug 13, 2017

This was really good. I could imagine it being required background reading for a lab of the compilation pipeline in 61C at Berkeley. Add in a section on linking.

sitkack · on Aug 14, 2017

Levine's "Linkers and Loaders" is available online https://www.iecc.com/linker/

stephengillie · on Aug 14, 2017

Great read! I learned a lot about LLVM and Clang. Would like a section on JITs.

lmirosevic · on Aug 14, 2017

Can anyone point out to me where the linker fits into the equation?