An IR (intermediate representation) is "retargetable": meaning new processors ca...

seanalltogether · on Nov 25, 2012

"An IR (intermediate representation) is "retargetable": meaning new processors can be supported with a new backend addon, not a full rewrite."

Why isn't this true of other high level languages like C, C++? Is LLVM just more strict? Could the others be made more strict to improve portability?

mahmud · on Nov 25, 2012

I don't think you understood what I was saying, and others replying to you are also off the mark, at least for the excerpt you quote.

LLVM is a compiler framework. Retargetability is a function of compilers, not languages. Most compilers are retargetable. A good sign is if a compiler generates code for more than 2, unrelated processors. Retargetability is just an engineering design. Most C, C++ compilers (in fact, most non-toy compilers maintained by more than a half a dozen people) are retargetable.

Think of retargetability is related to portability, but not same. A compiler can be non-portable but retargetable. Many compilers for small embedded processors target a huge number of processors, while themselves running on only x86 and Windows. That is called cross-compilation :-)

zanny · on Nov 25, 2012

Others have responded, but I don't think they get the gist of this question exactly.

C, C++, etc are just as retargetable as LLVM because you can take IR code, C, or C++ code (assuming standards compliance) and get a machine dependent binary anywhere you have a working compiler / assembler / linker stack. Clang uses LLVM as a translation layer for C and C++ because LLVM IR code is much more machine-like and makes writing the final assembler / linker easier, since you are just effectively translating abstracted assembly into machine specific assembly, instead of an object oriented / procedural dialect language into machine instructions.

alexk7 · on Nov 25, 2012

It is true of C and C++. In fact, LLVM is mostly used as a C/C++/Objective-C back-end (in Clang). The main advantage of LLVM is that it is a really good representation to perform optimizations on.

mahmud · on Nov 25, 2012

LLVM IR is in SSA form (static single-assignment) THE most popular IR for compiler research of late. Pretty much a lingua franca; optimization techniques old and new have been adapted to work with SSA.

derleth · on Nov 25, 2012

C and C++ are too high-level to directly represent a lot of useful things.

One of those things is multiple return values, implemented as a first-class part of LLVM since 2.3:

http://llvm.org/releases/2.3/docs/ReleaseNotes.html

Trying to fit this into C is painful; maybe it's easier in C++. You have to hope the optimizer on your compiler is really quite clever to get the same level of optimization the LLVM tools can provide with less effort because it is, as I said, directly representable in LLVM IR.

You also get things like memory use intrinsics, such as llvm.lifetime.start and llvm.lifetime.end, which help implement garbage collection by preserving information from the high-level code in the intermediate code. All such information would be thrown away in either C or C++ unless you do a lot more work.

http://llvm.org/docs/LangRef.html

stavros · on Nov 25, 2012

Very interesting, thank you. Why doesn't the GNU userland, for example, use LLVM? Although, I imagine gcc compiles to many architectures anyway. Still, the advantages sound many.

mahmud · on Nov 25, 2012

GCC predates LLVM; it also produces better output, has more maintainers and users, and runs on more platforms.

Sanddancer · on Nov 25, 2012

GCC doesn't always produce better output. A lot of networking code, like http daemons, as well as databases like PostgreSQL, tend to be sped up a bit by using llvm/clang. Also, the number of supported platforms for LLVM is growing pretty quickly; llvm/clang has support in the base tree for platforms like the MSP430, for example, that GCC doesn't.