An IR (intermediate representation) is "retargetable": meaning new processors can be supported with a new backend addon, not a full rewrite.
Compilation, specially of the type of language LLVM is most used for (impure, procedural, object oriented, with optional collection) lend themselves to very similar optimization techniques. Optimization happens in phases, one after the other, sometimes a phase might need to be repeated after the code undergoes further "simplification". Since the phases pretty much operate on the same data format, and since they're mostly portable across processor architectures in wide use today, it makes sense to delay actual object code generation until last.
The final assembly instructions generated are as good as "machine code". In fact, the whole reason it's called "assembly", as opposed to "compilation", or even "translation" is because there is a 1:1 correspondence between assembly instruction and machine instructions. Many assemblers use instruction hash-table to look up code by template :-)
Even when assembled, the program is not in runnable state as a lot of its external symbols and dependencies are not resolved. A file is its own compilation unit, and so functions referenced in other files in the project, or dependency libraries, or standard system libraries and system calls in use have to be either resolved, or registered somewhere handy for quick resolution later. Static linking does the first, dynamic linking does later. If the object file exports symbols for use by others it might need to get made into a shared-library.
As you can imagine, all this work is both very platform specific, and also tedious. Whence why one might want to avoid the final "weaving" of the binary, and leave it to someone who is intimately familiar with the target environment. Someone like the vendor assembler and linker, better yet, the high-quality binary tools from the good folks at GNU :-)
I don't think you understood what I was saying, and others replying to you are also off the mark, at least for the excerpt you quote.
LLVM is a compiler framework. Retargetability is a function of compilers, not languages. Most compilers are retargetable. A good sign is if a compiler generates code for more than 2, unrelated processors. Retargetability is just an engineering design. Most C, C++ compilers (in fact, most non-toy compilers maintained by more than a half a dozen people) are retargetable.
Think of retargetability is related to portability, but not same. A compiler can be non-portable but retargetable. Many compilers for small embedded processors target a huge number of processors, while themselves running on only x86 and Windows. That is called cross-compilation :-)
Others have responded, but I don't think they get the gist of this question exactly.
C, C++, etc are just as retargetable as LLVM because you can take IR code, C, or C++ code (assuming standards compliance) and get a machine dependent binary anywhere you have a working compiler / assembler / linker stack. Clang uses LLVM as a translation layer for C and C++ because LLVM IR code is much more machine-like and makes writing the final assembler / linker easier, since you are just effectively translating abstracted assembly into machine specific assembly, instead of an object oriented / procedural dialect language into machine instructions.
It is true of C and C++. In fact, LLVM is mostly used as a C/C++/Objective-C back-end (in Clang). The main advantage of LLVM is that it is a really good representation to perform optimizations on.
LLVM IR is in SSA form (static single-assignment) THE most popular IR for compiler research of late. Pretty much a lingua franca; optimization techniques old and new have been adapted to work with SSA.
Trying to fit this into C is painful; maybe it's easier in C++. You have to hope the optimizer on your compiler is really quite clever to get the same level of optimization the LLVM tools can provide with less effort because it is, as I said, directly representable in LLVM IR.
You also get things like memory use intrinsics, such as llvm.lifetime.start and llvm.lifetime.end, which help implement garbage collection by preserving information from the high-level code in the intermediate code. All such information would be thrown away in either C or C++ unless you do a lot more work.
Very interesting, thank you. Why doesn't the GNU userland, for example, use LLVM? Although, I imagine gcc compiles to many architectures anyway. Still, the advantages sound many.
GCC doesn't always produce better output. A lot of networking code, like http daemons, as well as databases like PostgreSQL, tend to be sped up a bit by using llvm/clang. Also, the number of supported platforms for LLVM is growing pretty quickly; llvm/clang has support in the base tree for platforms like the MSP430, for example, that GCC doesn't.
Compilation, specially of the type of language LLVM is most used for (impure, procedural, object oriented, with optional collection) lend themselves to very similar optimization techniques. Optimization happens in phases, one after the other, sometimes a phase might need to be repeated after the code undergoes further "simplification". Since the phases pretty much operate on the same data format, and since they're mostly portable across processor architectures in wide use today, it makes sense to delay actual object code generation until last.
The final assembly instructions generated are as good as "machine code". In fact, the whole reason it's called "assembly", as opposed to "compilation", or even "translation" is because there is a 1:1 correspondence between assembly instruction and machine instructions. Many assemblers use instruction hash-table to look up code by template :-)
Even when assembled, the program is not in runnable state as a lot of its external symbols and dependencies are not resolved. A file is its own compilation unit, and so functions referenced in other files in the project, or dependency libraries, or standard system libraries and system calls in use have to be either resolved, or registered somewhere handy for quick resolution later. Static linking does the first, dynamic linking does later. If the object file exports symbols for use by others it might need to get made into a shared-library.
As you can imagine, all this work is both very platform specific, and also tedious. Whence why one might want to avoid the final "weaving" of the binary, and leave it to someone who is intimately familiar with the target environment. Someone like the vendor assembler and linker, better yet, the high-quality binary tools from the good folks at GNU :-)