LLVM IR is being changed at a high rate. While there are some weak backward-compatibility promises (older, but not too much old, bitcode can be read by newer official IR readers), new features are introduced overnight. I expect this project to be a high maintenance job to keep up with the LLVM trunk.
@krasin The LLVM IR assembly language does change with a rather high rate. Not too often to be unfeasible, but an external project definitely needs to make sure it can keep up with maintenance to remain useful. Since the llir/llvm project started back in 2014, the LLVM IR assembly language has had one major change, and that is the new metadata syntax introduced in LLVM 3.6.
To facilitate maintenance efforts over time, a BNF grammar for LLVM IR assembly has been written, from which the lexer and parser are generated [1,2,3]. The intention is for this BNF grammar to eventually become the basis or starting ground for an official BNF grammar of the LLVM IR assembly language (but that's a different project altogether, and a huge effort in itself).
@mewmew, have you already added support for .bc files which have multiple bitcode modules (aka merged modules)? These are used in the work-in-progress ThinLTO.
@krasin Thanks for the input. From my understanding ThinLTO is intended to bring compilation speed of LTO builds closer that of non-LTO builds. Prior to the change you are referring to with merged modules, it seems this was achieved by optimizing multiple .bc modules in parallel during link time (using summery information from thin-link) [1].
Is there any high-level information of the design behind merged modules? Are they simply a concatenation of .bc files with a table of file offsets for each module?
I am new to ThinLTO and the work related to merged modules, so any information providing insight would be appreciated.
As for the llir/llvm project. It includes a .ll parser, but relies on the LLVM toolchain for converting .bc files into .ll; i.e.
llvm-dis -o foo.ll foo.bc
This decision has been taken so that we can focus time on maintaining good support for one of the isomorphic LLVM IR forms.
Any application which requires good performance should definitely make use of the official LLVM C++ library for interacting with LLVM IR.
The llir/llvm project is intended for those who wish to write tools in Go which consume, produce, process or manipulate LLVM IR.
Future releases of llir/llvm will try to get closer in performance to the official LLVM C++ library, but at this point of the project the aim is to iron out a good API for interacting with LLVM IR, and to have fun coding :)
For those interested, the llir/llvm project was born to support the requirements of a decompiler project [2] which decompiles LLVM IR to Go source code. The llir/llvm project has since become a general purpose library, and is now looking for anyone curious to try it out at this early stage to provide feedback on its API and design.
@mewmew generally, the .bc files with multiple bitcode modules existed for a while (almost a year now). The change I referred was just an incremental (breaking) improvement of that scheme.
I don't remember a good doc on how it's implemented. Last time I tried to understand it was early February and things had been changing at crazy speed. Your best bet is to ask on llvm-dev mailing list or #llvm IRC channel. Either tejohnson@ or pcc@ will know for sure.
@krasin Thanks! I know I'll continue to play around with LLVM on many levels in life. Just recently an LLVM Socials meetup has started in Sweden, and it feels great to get a chance to meet other people excited about exploring these topics :)
Maybe this would lend itself well to parsing bitcode slices (serialized LLVM IR). I've often wanted to diff bitcode produced by different compiler versions.
What's wrong with llvm-diff [1]? Is it that you want to compare across bitcode versions that weren't backwards compatible? (so the current llvm-diff won't parse an old old bitcode)
llvm-diff is indeed pretty great, but having the IR structure as an AST that you could programmatically access and manipulate would allow more semantic diffing, or integration into workflows that would benefit from structured representations of a diff.