The LLVM Compiler Infrastructure

jclay · on Dec 15, 2018

For those who haven't explored LLVM, it's an incredibly powerful technology and rather approachable for getting started with.

A few of my favorite LLVM discoveries:

- If you want to build Clang (LLVM C/C++ Compiler), it's really simple to pull down and build with little external development tooling (only a compiler and CMake). Shown here [0].

- You can pull LLVM/Clang master and use that for a "bleeding-edge" toolchain with a high degree of stability. [0].

- Clang is used to compile Chromium and Firefox on all platforms. Both have seen great performance gains from Clang's link time optimization. [1] [2]

- Klee is built on LLVM infrastructure and is used for automatic generation of test cases. [3] See some of their papers for how powerful their results are.

- Clang can now produce ABI compatible binaries with MSVC and has support for the Visual Studio debugger. [4]

- Lots of interesting research projects leveraging LLVM [5]

[0] https://www.youtube.com/watch?v=uZI_Qla4pNA

[1] http://blog.llvm.org/2018/03/clang-is-now-used-to-build-chro...

[2] https://glandium.org/blog/?p=3888

[3] https://klee.github.io

[4] http://blog.llvm.org/2017/08/llvm-on-windows-now-supports-pd...

[5] https://scholar.google.com/scholar?as_ylo=2017&q=LLVM&hl=en&...

eslaught · on Dec 15, 2018

Klee is also a great example of the cost of LLVM's never-backwards-compatible approach: according to their home page, the stable release of Klee runs on 3.4 and 3.8 is "experimental". (The LLVM releases since then have included: 3.9, 4.0, 5.0, 6.0, and 7.0.)

Don't get me wrong, LLVM is great, but I really wish that as the project matured they had made more of an effort to provide some level of compatibility. As it is, it's a massive pain for open source projects to follow along.

drmeister · on Dec 15, 2018

Counterpoint - I implemented a Common Lisp compiler that uses llvm as the back end (https://github.com/clasp-developers/clasp). I've kept up with the versions from 3.6 to 6.0 and I'm about to upgrade to 7.0.

I'm not sure why other projects find this difficult but I have noticed that if I get too far behind it does get a bit more difficult.

I exposed the C++ API to Common Lisp and write all of the compiler in Common Lisp. The Clang C++ compiler tells me what is broken when I upgrade to a new version of llvm and then I fix it. I talk about it here https://www.youtube.com/watch?reload=9&v=mbdXeRBbgDM&feature... at about 13:30

jcranmer · on Dec 15, 2018

If you compile a bitcode file from anywhere around LLVM 3.3 or later, you can generally read it in the most up-to-date version of LLVM (you may have to first strip debugging metadata though).

The LLVM-C API also maintains compatibility, but that comes at the steep cost of preventing you from using most of the power: LLVM-C is basically limited to reading/writing LLVM code, running passes largely at the -O1/-O2/-O3 level of granularity, and executing the JIT. Writing a pass with that API is very difficult, and pretty much impossible if you want to use any analyses.

jclay · on Dec 15, 2018

That's a good point. Most of the large LLVM projects I've used (such as Root's Cling) seem to lag behind LLVM's versions rather drastically.

jclay · on Dec 15, 2018

My list is mostly C++ focused, but the LLVM tooling for clang-tidy/clang-query is also worth a mention. There are a big set of refactors that you can apply across your codebase if you wanted to migrate to new C++ features for example. You can also write your own if you wanted to refactor to use the API of an updated dependency for example. It's a lower-level version of Javascript's "codemods" for C/C++.

There's some work being done to build some tooling around interactively creating AST matchers and applying refactors across an entire codebase. [0]

[0] https://steveire.wordpress.com/2018/11/11/future-development...

tanin · on Dec 15, 2018

LLVM is a great project. I'm writing a programming language based on it.

It is so much easier because: (1) it's easier than writing machine code translation and (2) it's also easier than translating your code to C which would be purely textual.

Nevertheless, it's difficult to learn for people coming from high-level languages, even for me, who knows C to some degree.

The documentation around IR is scarce.

Then, I've stumbled upon a trick where I can just write C code and compile to LLVM IR with Clang. With this trick, I can answer many questions on my own (e.g. how to implement if-else, how to call printf, how to make a dynamic-sized array.)

jclay · on Dec 15, 2018

Nice tip! I've also been watching the Godbolt.org progress improving LLVM IR support which could also be helpful for interactively exploring IR generation.

The results are pretty good today if you select Clang compiler and pass `-emit-llvm` flag.

tanin · on Dec 16, 2018

That's exactly what I have been doing `clang -S --emit-llvm <filename>`.

Godbolt.org is even more handy!

pnloyd · on Dec 15, 2018

That last tip is so clever! I'll have to remember that if I ever want to undergo such a project.

saagarjha · on Dec 15, 2018

> The name "LLVM" itself is not an acronym; it is the full name of the project.

What happened to Lower-Level Virtual Machine?

cornstalks · on Dec 15, 2018

From Chris Lattner (http://lists.llvm.org/pipermail/llvm-dev/2011-December/04644...):

> "LLVM" is officially no longer an acronym. The acronym it once expanded too was confusing, and inappropriate almost from day 1. :) As LLVM has grown to encompass other subprojects, it became even less useful and meaningless.

> In short, it is just "The LLVM Project", and LLVM doesn't stand for anything anymore. It is a nice short domain name though

stdplaceholder · on Dec 15, 2018

It wasn't ever an acronym. It was an initialism.

nsstring96 · on Dec 15, 2018

I think that originally, it was indeed supposed to be an optimizing virtual machine. IIRC, Lattner's PhD thesis might have been about it.

jcranmer · on Dec 15, 2018

LLVM was always intended to be a compiler IR rather than a "JIT for C", and the "virtual machine" in the name was named as much to refer to the idea that it's a more direct model of an instantiation of the C abstract machine as any actual ability to perform dynamic optimization.

The big push LLVM made early on was on being able to do ahead-of-time, runtime, and idle-time optimization; the latter of which is mostly reflected in profile-guided optimization. The poor quality of its dynamic optimization [1] and the general reluctance to use profile-guided optimization means that LLVM ended up focusing hard on optimizing ahead-of-time like traditional C compilers. The fact that people got too easily confused about what LLVM actually did caused Lattner to de-acronymize it officially several years ago.

[1] A more difficult problem for C and C++ is that actually collecting enough of the code in a single IR representation is surprisingly hard, as build systems are completely and totally inane, and that's before you start trying to figure out how to glue in things like assembly files or glibc.

saagarjha · on Dec 15, 2018

Masters, I believe: http://llvm.org/pubs/2002-12-LattnerMSThesis.html

monocasa · on Dec 15, 2018

Its current tradeoffs make more sense as a offline compiler than a traditional virtual machine compiler. Best case for using LLVM in a VM is a last tier compilation that's supposed to approach offline compilation. But even projects that have done that like JavaScriptCore have ended up replacing LLVM with their own backend.