I’m going to be building a compiler over Christmas break, and I was thinking of targeting the LLVM toolset. This will be my second compiler; first one emitted x86 directly. Should I consider switching to QBE? I’m more focused on compiler passes that I want to write; register allocation, etc. is not my thing. I’m going to be working with CPS-conversion ans stuff like that. Does anyone have any good tutorials for using either system?
Sounds like you want to compile a functional language. The best tutorial (in terms of being self-contained and being understandable in entirety) I've seen to compile FP to C is[0], I'd imagine that retargeting to LLVM wouldn't be that difficult either.
That’s a great idea honestly. Even Rust would make a good compile target, the strong typing could help debug the compiler as you go. Although LLVM assembly is a good choice too!
Are there any big downsides to compiling to C instead of LLVM? I would assume that it would be less performant but it feels like a shallow preconception.
It's somewhat more constricting, certain types of compiler magic are hard to reproduce in plain C. For instance, dealing with things that need to be stack-aware, such as a GC or tail recursion, takes epic hacks like the Boehm GC or the "Cheney on the MTA" trick. You're also stuck with the C calling convention.
Plus, while generating C is simple, you're entirely on your own. With LLVM IR, you benefit from its infrastructure, the optimization passes and other LLVM-based tools, so you don't need to reinvent the wheel as much. In the end, you'll need some kind of IR anyway, and LLVM is a good place to start.
> Are there any big downsides to compiling to C instead of LLVM?
Well, for starters, you have to import all of C's undefined behavior, so it would be basically impossible to implement, e.g., 2's complement signed integer arithmetic.
Strictly speaking it isn't impossible, just much slower since you'd have to implement on software what the hardware does.
In practice though pretty much every compiler that uses that as an optimization has a flag to disable it and chances are you'll be needing to use other system and/or compiler specific stuff anyway.
LLVM is much better documented than any other compiler system, so I would just KISS and use it. LLVM is a beast, though, but to get started I'd want to focus on the my side of the compiler first (the sema is where the pain is imo)
Remember that there's no reason why you can't have an IR in between your language and the backend.
If your language fits, (based on the CPS it probably won't but still) you can use a subset of it as a representation of that IR and use lowering to rewrite expensive constructs into smaller ones, greatly reducing the surface area you have to work with.
Would you rather use a project you can at very least copy and paste from another codebase with, or - for what is basically the easy bit of the compiler - fiddle around with getting a (possibly more suitable backend in the long run, but you don't know that yet) much less mature project working.
Better documented doesn't mean the documentation is perfect. My experience two years ago was: You're going to have questions, and it's going to be difficult to find answers.
You have other options. You could design your own bytecode, and write an interpreter. You could emit C, and use a C compiler. If you use LLVM, you also need to decide whether you want to link against LLVM and use the C++ API. Or you can use bindings for another language. Or you can emit IR as plain text, then feed that into the LLVM compiler.
I have done the latter, and would advise against it. The IR is powerful, but quite involved. The documentation is sometimes lacking. It's an interesting project, but you probably won't get far.
Linking against LLVM means you have to use their C++ interface, and every time you rebuild your compiler, it's going to be slow, with all the templates they use. Your compiler will be a large executable (clang and zig are about 90 MB), and you'd have to rebuild that every time you change your compiler. I would hate having to do that in my spare time, which is why I went the other route.
LLVM is very powerful. There's all these optimizations you get, and debug info. But it's not free. The integration is going to be hard and you'll have to spend quite some time on it.
The next time I would start a language project I would build an interpreter in a dynamic language, and possibly even use a parser generator. https://craftinginterpreters.com/ has a nice path laid out. Focus on what you actually want to do. Once you get somewhere, once you actually like your language, once you actually start a project in your own language, and have an actual need for more, you can write another compiler with more bells and whistles.
Because llvm is unusable for a dynamic language VM. The compile-times are too slow, it's way too big.
cranelift is also big, but has sensible staged optimizers. I.e. it doesn't behave like java or llvm.
We've been JITing using LLVM for a number of years without an obvious problem. It obviously depends on the size of the code you are wanting to JIT, but the OP was discussing a toy language as a side project, and LLVM is certainly perfect for that use case.
> cranelift is also big, but has sensible staged optimizers
LLVM gives you the freedom to decide what optimizers to use. But it's true it is too big for many applications, even if there are build options for different sizes.
This depends on your usecase i guess.
For long running numerical code (Julia) LLVM is fine but needs some work by highly skilled individuals to make it feel like a interpreted REPL when importing big code bases.