The other thought I had here, is that AFAICT IR is not a standard. There is no r...

dumael · on Jan 1, 2017

Almost every modern compiler uses some form of intermediate representation. The choice of IR is shaped by history and design. As the posted article shows, LLVM uses a SSA based IR to describe programs. GCC in contrast uses 2 IRs, GIMPLE and a LISP based IR called RTL. GHC uses Core Haskell (Haskell without the syntactic sugar).

The purpose of every IR is to remove the ambiguities and language complexities of programs. By simplifying programs into series of statements such as "%3 = op $type %1, %2", generic optimisers can be built easily. Certain language specific optimizations can be written for the frontend of the compiler as they have knowledge of the language being compiled. Generic LLVM-IR may not be optimised to deal with issues such as devirtualization in C++ (though there is work being done in that area).

LLVM's IR undergoes fairly occurrent changes to better handle "new" problems.

g0xA52A2A · on Jan 1, 2017

> The other thought I had here, is that AFAICT IR is not a standard. There is no requirement that it remains compatible in 50 years or 5 months.

Correct. Libfirm[0] is the only compiler I am aware of that attempts to use a "firm" IR.

[0] http://pp.ipd.kit.edu/firm/Index

david-given · on Jan 1, 2017

The ACK's IR (which it calls EM code) has been stable for 34 years.

http://tack.sourceforge.net/olddocs/em.pdf

g0xA52A2A · on Jan 1, 2017

Neato! Thanks for the link.

Sorry if the tone of "only compiler I am aware of" came of as snotty it was meant as an expression of my naivety on the subject.

david-given · on Jan 1, 2017

Nah, don't worry; nobody's heard of the ACK! (Mostly because it's so old that it doesn't really believe that registers exist, which means it doesn't get on well with modern architectures.)

skissane · on Jan 1, 2017

> Doesn't AS/400 use an IR approach as well? Which let IBM seamlessly migrate the underlying CPU a few(?) times now?

Yes it uses an IR approach. MI code is essentially a byte code to which programs are compiled and then in turn the OS compiles them to the underlying machine code. AS/400 (or IBM i as it is officially now called) did use this to help in the switch from a proprietary/undocumented CISC CPU architecture (apparently similar but not identical to the IBM mainframe instruction set) to PowerPC/POWER. However, binaries are stored with both MI code and machine code together, and there is an option "Delete Program Observability" which removes the MI code section, which makes that migration strategy impossible – but, in practice, many people didn't choose that option, and if you did, so long as you or the vendor still have the source code it is just a recompile to fix it.

AS/400 has two program models – OPM (Original Program Model) and ILE (Integrated Language Environment). Basically, the original object code format, runtime library, etc, were designed for use with RPG/COBOL/PL/I, and they didn't work very well with languages such as C and C++, so ILE was created to remedy that deficiency. The relevance to IR, is that OPM and ILE actually use two different MI formats - Old MI (OMI) for OPM and New MI (NMI) for ILE. (NMI is also called "W-code".) IBM has publically documented most of the details of OMI and provides a public API to convert OMI code to machine code. By contrast, my understanding is they've chosen to keep the details of NMI confidential, and there are no public APIs to convert it to machine code.

dom0 · on Jan 1, 2017

> Doesn't AS/400 use an IR approach as well? Which let IBM seamlessly migrate the underlying CPU a few(?) times now?

In these systems both something like IR and final assembly is stored in the binary. The OS recompiles the IR for the current CPU if necessary and replaces the assembly in the binary. That way there is no compilation overhead unless architectures are changed.

shykes · on Jan 1, 2017

IR isn't a standard but there are efforts to standardize a stable subset. This would be useful for projects like Google's pnacl and Apple's bitcode. I'm guessing they currently must invest lots of effort to stay in sync with upstream llvm.