Hacker News new | past | comments | ask | show | jobs | submit login
QBE vs. LLVM (c9x.me)
295 points by peter_d_sherman on Dec 2, 2020 | hide | past | favorite | 99 comments



Also to note, LLVM isn't the first compiler toolchain of its kind.

Notable mentions,

IBM's research project on PL.8 while developing their first RISC designs.

https://rsim.cs.uiuc.edu/arch/qual_papers/compilers/auslande...

https://pdfs.semanticscholar.org/3288/fc042cd474f0ec93d67753...

https://rishiheerasing.net/modules/hca2102/paper/cocke.pdf

The Amsterdam Compiler Toolkit,

http://tack.sourceforge.net/

https://github.com/davidgiven/ack

https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.12...

The more the merrier.


Also TenDRA http://www.tendra.org/ (which is about 30 years old).

TenDRA is designed for front-ends to compile to ANDF, an "architecture-neutral distribution format" which is kind of like LLVM bitcode but different. Sadly it never really gained any lasting momentum as an open source project, though it currently has a few active maintainers.

And much more recently there's Cranelift https://github.com/bytecodealliance/wasmtime/tree/main/crane... which is written in Rust and is being used as a back-end for WebAssembly and Rust.


Interesting. Had a look at http://www.tendra.org/developer-orientation; seems to be even more complex than LLVM. Didn't find out yet, what backends it actually supports.


Also the Rose Compiler infrastructure: http://rosecompiler.org/

Though it's more for source transformation and analysis.



LLVM is much much much more than just the IR, e.g. LLVM's scheduling code is sufficiently abstract and compartmentalised that it can be used to (optimistically, last time I checked) predict performance of machine code not generated by LLVM (llvm-mca)


No one mentioned the Purdue Compiler Construction kit yet? Weren't they and the Amsterdam compiler kit the 2 'big ones' back in the day ?


Yes, big failure forgeting about it.


libFIRM is another one for the the list

https://pp.ipd.kit.edu/firm

edit: libFIRM vs LLVM https://pp.ipd.kit.edu/firm/LLVM.html



[flagged]


LLVM does seem to be explicitly designed from the start to be "industry"-worthy rather than just academic research, though.

It was actually offered to GNU years ago but (if I read the thread correctly) rms didn't get the email because of the unique way he does email.

It's a shame it's not under the GPL, you can already see a sign of the future (present?) in apple not upstreaming their backends.


If it was licensed under GPL, my bet is that Apple would not have adopted it at all.


That's entirely possible, but does it hurt their bottom line? They switched from GCC, because LLVM was available (to grow and eventually use).

Their aversion to the GPL seems mostly cultural, which is interesting given that we skewer microsoft for their sins along those lines 20 years ago now.


Apple has an aversion to GPL 3 and not to GPL 2.x because of what 3 forces people to do. It's the same with FreeBSD: once libgcc_s went to 3 they could no longer use it.

It's the same reason why bash and rsync have not been upgraded on macOS to newer versions: they also started using GPL3.


Honestly curious what's the thing that "[GPLv]3 forces people to do" in context of those cases? IANAL but interested in perspective of choosing license for some of my current and future works


Not lock down people's devices basically. [L]GPLv3 requires that people are able to replace the [L]GPLv3'd binaries in their own devices if they want to (so they actually take advantage of the code being FLOSS instead of some nice theoretical novelty - ie. it is nice that my router uses Linux and find it amusing that it comes with the GPLv2 printed out, but what is the point if i can't open it up and fix/change some things that annoy me?). Apple wants to keep the devices they sell you under tight control, so [L]GPLv3 rubs them wrong.

And so they avoid it.

Related: https://en.wikipedia.org/wiki/Tivoization

It might also be about patents since IIRC [L]GPLv3 requires that if you use a [L]GPLv3'd program and that program infringes some of your patents then you are not allowed to sue others over their use of your patents in the same program.


See also "Why is FreeBSD deprecating GCC in favor of Clang/LLVM?":

* https://unix.stackexchange.com/questions/49906/why-is-freebs...

GPL3 is more restrictive than GPL2, which in turn is more restrictive than BSD/MIT, in the name of allowing other freedoms for end-users.


Thanks, this link seems quite comprehensive and the answer author tries to write in a neutral voice - appreciated!


GCC would not have taken it anyway, they tried to prevent parts of GCC getting used independently or as a library to keep the lock-in to GNU, while the architecture of LLVM is of embeddable reusable libraries.


>It was actually offered to GNU years ago but (if I read the thread correctly) rms didn't get the email because of the unique way he does email.

Interesting anecdote. Do you have a source for that?



Can you elaborate it a bit? I didn't see anything that suggested "rms didn't get the email because of the unique way he does email."


probably based on this: https://lists.gnu.org/archive/html/emacs-devel/2015-02/msg00...

I don't understand the bits about RMS' email configuration, but in that message in 2015 he indicated that he did not know about the offer in 2005 to give copyright of LLVM to the FSF.


Thanks for the source.


I don't, someone linked it to me on here a week ago so I'll try and dig it up.

Apparently rms never saw the email because of his server arrangement?


Apple doesn't upstream of their changes already, hence why you have a specific Apple column on cppreference.

https://en.cppreference.com/w/cpp/compiler_support

And the bitcode used by Apple Store for application delivery isn't 100% like the LLVM bitcode.


That table doesn't have greens for apple clang that are missing for regular clang. As far as I know they take some time integrating upstream llvm changes, so the two columns are due to that.

Historically they've been very good about upstreaming. Sometimes it takes a while, but so far everything made it to the public repo.


Definitely got sympathy for that position. Upstreaming changes often takes a long time to get across context, get community buy in to changes, get approval cut a release, whereas internally, it's often just a code review, which can be prioritised according to business needs.

It's quite common for long standing forks to exist, where changes get pushed upstream from the fork, and the fork gets periodically re-cut.


> Apple doesn’t upstream of their changes already, hence why you have a specific Apple column

It looks to me like it’s the exact opposite, based solely on your link that is. Based on your link, they lag _behind_ upstream (vs being _ahead_ of upstream as you indicated) and don’t offer the same features, without a delay anyway, in “Apple Clang” that are in “Clang”.


Hence why I said it was a sign of the future rather than a prediction. It's not going to get any better for software freedom, especially as we move into the cloud.

We live in a post-FSF world, I think people are taking for granted the bean counters playing nice with open source software.


We need software that works on the client side sans internet connection. Devices are really powerful these days so it’s funny things move to the cloud. It’s funny I’ve been using cloud storage for the past several years but I wonder why I don’t just back up to external drive or two. The convenience doesn’t buy much. I think folks really buy into laziness and technology hype. But fundamentally not much had changed in the software space compared to 20 years ago.


You already know why you don't do backups to a local hard drive. And you won't start doing it either. Don't kid yourself, cloud is convenience.


This is what I do. All my staff is backed up onto hard drives. I also have offsite server which takes backups from those backups just in case.


I still hold out hope that things will get better in the future. Eventually Moore’s law will truly end. We will have to stop once we run up against hard physical limitations.

Why is this a reason for optimism? Because it shifts the outlet for innovation elsewhere. Costs of manufacturing will drop as competition in the fab space increases. I think RISC-V is a sign of things to come. There is great potential for open and amateur hardware.


Bitcode was never compatible anyway, I'm more surprised that they have it in delivery pipeline than anything else.


This needs a (2016) in the title. It's not evident from the text but it is from the announcement: https://www.reddit.com/r/programming/comments/4fysrk/qbe_my_...


I wanted to write an LLVM backend for various instruction sets (e.g. Z80, R216[0]), since those would deal with the problems of optimization passes and register allocation for me, but the LLVM tutorial[1] makes it look so goddamn hard. Does anyone know of a tutorial or of a declarative way to write such backends for either QBE or LLVM? The QBE git repository[2] has a few backends but also looks similarly involved.

Comparatively, writing frontends is quite easy these days and many examples of compiling to IR exist.

[0] https://trigraph.net/powdertoy/R216/manual.md

[1] https://llvm.org/docs/WritingAnLLVMBackend.html

[2] https://c9x.me/git/qbe.git/tree/amd64


Writing backends to integration into compiler stacks like LLVM is as challenge unless one is quite deep into compilers.

If the idea is just to learn how they work, it is easier to just dump an IR that is compatible with a macro assembler, then just call the assembler with your macro definitions on the generated IR.

It won't win prices in performance, but one gets to learn how things work and will have a compiler with proper binaries at the end, and one can fine tune the macros to improve the quality of the code anyway.


Writing compiler backend in a declarative manner is something I've been toying with for a while now, and I don't think it's realistically possible with current technology.

Backends are hard, unfortunately.

However, since you are targeting the Z80, you might be able to just take the IR and transform it yourself? I haven't touched an LLVM backend for a while, so I don't know what the process is atm, but if you imagine that a lot of a modern compiler is dealing with instruction scheduling and the like , which the Z80 doesn't really have a need for.

LLVM is very good for the industry, but it is a little over-engineered(?). Maybe that's a bit harsh, but the difficulty is definitely thrown at you earlier than hacking on most compilers.


It is not purely declarative, but Fraser & Hanson's lcc comes close while still delivering good optimization -- and has a book describing it (IIRC it was written literate-programming style).

It's been 20 years or more, I played with an older version that required an Icon interpreter as part of the build chain, and I think they had a later one that did away with that requirement.

The code generator is basically built from a declarative description of tree matching/rwriting templates with a cost model, and the compiler will find the optimal cost match.


> Writing compiler backend in a declarative manner is something I've been toying with for a while now, and I don't think it's realistically possible with current technology.

Functional style isn't too bad. Maximal munch lines up with pattern matching, then linear scan is easy enough if you don't need gcc-tier results.

GHC has bits of its backend that are shared across architectures, maybe I should have a look at that...


It's certainly not trivial, but still an interesting journey. I made a backend for MMIX, heavily influenced by the risc-v implementation, and kept the patches logically consistent by rebasing new changes to serve as an example. The last rebase was a year ago, but I think following a previous implementation is the best way forward as there is so much boiler-plate: - https://github.com/stuij/mmix-llvm-support - https://github.com/stuij/mmix-llvm-backend

That said, those kind of olden CPU's aren't a terribly snug fit for LLVM, due to their peculiarities. You'll have to bend LLVM quite a bit to bend it to your will, which also means that you need to get quite familiar with it as well.

edit: this tutorial is even older still, but for me it was one of the best resources, next to checking other simple backends: https://jonathan2251.github.io/lbd/llvmstructure.html




> Implementing a complete C ABI (with struct arguments and returns) is incredibly tricky, and not really a lot of fun.

This is so true. Rust does it here: https://github.com/rust-lang/rust/tree/master/compiler/rustc... and it's a lot of code for no good reasons.


+1 I plan to write a toy compiler and considered llvm. This one thing is the reason I will never use llvm for a hobby project.

Even for non-toy compilers, I think it's a waste of human resources that every frontend has to implement the C ABI.


Why does LLVM work that way?


This is surprising for me, I thought LLVM abstracted away platform ABIs.


I’m going to be building a compiler over Christmas break, and I was thinking of targeting the LLVM toolset. This will be my second compiler; first one emitted x86 directly. Should I consider switching to QBE? I’m more focused on compiler passes that I want to write; register allocation, etc. is not my thing. I’m going to be working with CPS-conversion ans stuff like that. Does anyone have any good tutorials for using either system?


Sounds like you want to compile a functional language. The best tutorial (in terms of being self-contained and being understandable in entirety) I've seen to compile FP to C is[0], I'd imagine that retargeting to LLVM wouldn't be that difficult either.

[0] https://github.com/jozefg/pcf


That’s a great idea honestly. Even Rust would make a good compile target, the strong typing could help debug the compiler as you go. Although LLVM assembly is a good choice too!


Are there any big downsides to compiling to C instead of LLVM? I would assume that it would be less performant but it feels like a shallow preconception.


It's somewhat more constricting, certain types of compiler magic are hard to reproduce in plain C. For instance, dealing with things that need to be stack-aware, such as a GC or tail recursion, takes epic hacks like the Boehm GC or the "Cheney on the MTA" trick. You're also stuck with the C calling convention.

Plus, while generating C is simple, you're entirely on your own. With LLVM IR, you benefit from its infrastructure, the optimization passes and other LLVM-based tools, so you don't need to reinvent the wheel as much. In the end, you'll need some kind of IR anyway, and LLVM is a good place to start.


> Are there any big downsides to compiling to C instead of LLVM?

Well, for starters, you have to import all of C's undefined behavior, so it would be basically impossible to implement, e.g., 2's complement signed integer arithmetic.


Strictly speaking it isn't impossible, just much slower since you'd have to implement on software what the hardware does.

In practice though pretty much every compiler that uses that as an optimization has a flag to disable it and chances are you'll be needing to use other system and/or compiler specific stuff anyway.


This looks great! I’m taking a different approach than this compiler, but this looks like a good resource. Thanks!


LLVM is much better documented than any other compiler system, so I would just KISS and use it. LLVM is a beast, though, but to get started I'd want to focus on the my side of the compiler first (the sema is where the pain is imo)

Remember that there's no reason why you can't have an IR in between your language and the backend.

If your language fits, (based on the CPS it probably won't but still) you can use a subset of it as a representation of that IR and use lowering to rewrite expensive constructs into smaller ones, greatly reducing the surface area you have to work with.


KISS is an acronym for 'keep it simple', which is an surprising choice of word when advocating for the much more complicated choice.


Would you rather use a project you can at very least copy and paste from another codebase with, or - for what is basically the easy bit of the compiler - fiddle around with getting a (possibly more suitable backend in the long run, but you don't know that yet) much less mature project working.


Better documented doesn't mean the documentation is perfect. My experience two years ago was: You're going to have questions, and it's going to be difficult to find answers.

LLVM is not KISS. Very, very far from it.



You have other options. You could design your own bytecode, and write an interpreter. You could emit C, and use a C compiler. If you use LLVM, you also need to decide whether you want to link against LLVM and use the C++ API. Or you can use bindings for another language. Or you can emit IR as plain text, then feed that into the LLVM compiler.

I have done the latter, and would advise against it. The IR is powerful, but quite involved. The documentation is sometimes lacking. It's an interesting project, but you probably won't get far.

Linking against LLVM means you have to use their C++ interface, and every time you rebuild your compiler, it's going to be slow, with all the templates they use. Your compiler will be a large executable (clang and zig are about 90 MB), and you'd have to rebuild that every time you change your compiler. I would hate having to do that in my spare time, which is why I went the other route.

LLVM is very powerful. There's all these optimizations you get, and debug info. But it's not free. The integration is going to be hard and you'll have to spend quite some time on it.

The next time I would start a language project I would build an interpreter in a dynamic language, and possibly even use a parser generator. https://craftinginterpreters.com/ has a nice path laid out. Focus on what you actually want to do. Once you get somewhere, once you actually like your language, once you actually start a project in your own language, and have an actual need for more, you can write another compiler with more bells and whistles.


cranelift is a better alternative.


why?


Because llvm is unusable for a dynamic language VM. The compile-times are too slow, it's way too big. cranelift is also big, but has sensible staged optimizers. I.e. it doesn't behave like java or llvm.


We've been JITing using LLVM for a number of years without an obvious problem. It obviously depends on the size of the code you are wanting to JIT, but the OP was discussing a toy language as a side project, and LLVM is certainly perfect for that use case.


> cranelift is also big, but has sensible staged optimizers

LLVM gives you the freedom to decide what optimizers to use. But it's true it is too big for many applications, even if there are build options for different sizes.


This depends on your usecase i guess. For long running numerical code (Julia) LLVM is fine but needs some work by highly skilled individuals to make it feel like a interpreted REPL when importing big code bases.


Consider generating WASM.


What was your first compiler?



I wish something like this existed for garbage-collected languages. The closest thing I know of is the JVM (and other similar VMs), but I'm hoping for ahead-of-time compilation.

The key difficulty in adding garbage collection to code emitted with tools like this is being able to accurately tell what the root set is.

(I know LLVM has support for this in theory, but I have found nothing that tells how to actually use it)


There are LLVM backed languages that provide garbage collection; Julia and Crystal are the first two that come to mind. Haskell also has an LLVM back-end.


The Mu Micro Virtual Machine might interest you: https://news.ycombinator.com/item?id=25278568


https://docs.oracle.com/en/java/javase/15/docs/specs/man/jao...

I thought this wasn't universally available (that it was only in a special version of the JDK - https://openjdk.java.net/projects/graal/). But it appears to be everywhere, albeit experimentally.


Something like the CIL[0]? I know there was a python and a ruby implementation on top of it.

[0] https://en.wikipedia.org/wiki/Common_Intermediate_Language


That's about viable nowadays.

Elaborating: Until recently, your best option for running CIL in an AOT type environment would be Mono AOT, which has it's own limitations.

Nowadays, however, we do have CoreRT in a state that lets you do AOT, still with limitations around things like reflection. But, it's fairly doable nowadays, I would think.


I wish there would be pure .NET project, that at least matched QBE. Garbage collection would be interesting to tinker on top of it.


There's a version of the JDK that provides AOT.


One important thing to note is that QBE emits text-based assembly and relies on a system-installed assembler. Meanwhile LLVM produces object files.


> "Implementing a complete C ABI (with struct arguments and returns) is incredibly tricky, and not really a lot of fun."

This sounds like it should be wrapped as an helper library on top of LLVM and embedded inside LLVM. Why it is not the case ?


Doesn't allowing non-SSA forms like

     %v =w add %v, 1.
defeat the purpose of using SSA in the first place? If this form is allowed, then I can not rely on `%v` having an immutable value which is much of the reason why we use SSA in the first place.


Not if the framework can readily create v0, v1 from the temporaries. It'll still end up as a full SSA according to what they describe.


Is it a good idea to use BNF for documentation? This form of document doesn't seem to be very useful for people who want to use the language.

Both WASM and javascript docs do this, and it is also the same here https://c9x.me/compile/doc/il.html

I think LLVM's IR document is not easy to read, this one is not better.

I don't want to reimplement a compiler for the IR language, I want to know how to use it, how to generate code using it. I need to see examples, i.e. how to implement threading, coroutine, weakly typed variables, reflection ... not the BNF.


I am also working on a compiler and need to decide on a backend soon. I currently plan to use C (clang) as IL in the first iteration, with the option to upgrade to LLVM IL later. I wonder whether there is any easier alternative. I definitely need to start with a framework that takes text input, because everything else would be hard to integrate with the TypeScript-based interpreter of my language. And it needs to work with a runtime written in C. QBE looks nicer than LLVM's IL, but I wonder whether it would cause more trouble in the end.


Since this version of your language will never be used by thousands of users (if you ever hit sucess you’ll want to rewrite your compiler anyway) QBE is a lot simpler to use than LLVM. The main limitation is the limited set of target architecture, and probably the dlow development. In effect Linux AMD64 and ARM64 are the only well supported architectures. Also you may be able to compile to windows using mingw assembler.


Another similar project is https://github.com/vnmakarov/mir


I would love to have a way to bootstrap LLVM. This looks promising but I'm guessing a C++ frontend is much more challenging than writing this backend.

gcc used to be the missing link, but they switched to C++ and now you can't bootstrap it with something like tcc.


Can't you still bootstrap LLVM from GCC, and GCC from the last C version, and that from tcc?

GNU GUIX works on the boostraping issue, they strive to reduce the seed size as much as possible: https://guix.gnu.org/blog/tags/bootstrapping/


QBE https://c9x.me/compile/ "QBE aims to be a pure C embeddable backend that provides 70% of the performance of advanced compilers in 10% of the code"


What I’d like something similar to QBE, but that actually compiles/targets LLVM.

So, essentially, a beginner-friendly intermediate layer on top of LLVM.


You could transpile to C and feed that to LLVM/Clang or e.g. to TCC (https://github.com/TinyCC/tinycc).


Sure, but the idea is to avoid C and compile to a simplier IR, like qbe. And then compile that to native code in 1/10 of the time and memory LLVM or GCC do.


The question was "a beginner-friendly intermediate layer on top of LLVM". If you use TCC instead of Clang or GCC compilation speed is very fast.


It's for ahead-of-time compilation only, right? I gather it always outputs assembly files.


> QBE is extremely hackable.

and

> it is programmed in non-fancy C99 without any dependencies.

These are contradictions. C99 is not a "very hackable" language. If the compiler isn't designed to make fast binaries then they should have written it in a higher level language that has less foot guns.



I only hope that by we don't get a lot of new languages, which are not only not production ready but also poorly designed. ;)


What's the harm? Just don't use them.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: