I’m going to be building a compiler over Christmas break, and I was thinking of ...

siraben · on Dec 2, 2020

Sounds like you want to compile a functional language. The best tutorial (in terms of being self-contained and being understandable in entirety) I've seen to compile FP to C is[0], I'd imagine that retargeting to LLVM wouldn't be that difficult either.

[0] https://github.com/jozefg/pcf

arcticbull · on Dec 2, 2020

That’s a great idea honestly. Even Rust would make a good compile target, the strong typing could help debug the compiler as you go. Although LLVM assembly is a good choice too!

worldsayshi · on Dec 2, 2020

Are there any big downsides to compiling to C instead of LLVM? I would assume that it would be less performant but it feels like a shallow preconception.

arc-in-space · on Dec 2, 2020

It's somewhat more constricting, certain types of compiler magic are hard to reproduce in plain C. For instance, dealing with things that need to be stack-aware, such as a GC or tail recursion, takes epic hacks like the Boehm GC or the "Cheney on the MTA" trick. You're also stuck with the C calling convention.

Plus, while generating C is simple, you're entirely on your own. With LLVM IR, you benefit from its infrastructure, the optimization passes and other LLVM-based tools, so you don't need to reinvent the wheel as much. In the end, you'll need some kind of IR anyway, and LLVM is a good place to start.

jcranmer · on Dec 2, 2020

> Are there any big downsides to compiling to C instead of LLVM?

Well, for starters, you have to import all of C's undefined behavior, so it would be basically impossible to implement, e.g., 2's complement signed integer arithmetic.

badsectoracula · on Dec 2, 2020

Strictly speaking it isn't impossible, just much slower since you'd have to implement on software what the hardware does.

In practice though pretty much every compiler that uses that as an optimization has a flag to disable it and chances are you'll be needing to use other system and/or compiler specific stuff anyway.

ashton314 · on Dec 2, 2020

This looks great! I’m taking a different approach than this compiler, but this looks like a good resource. Thanks!

mhh__ · on Dec 2, 2020

LLVM is much better documented than any other compiler system, so I would just KISS and use it. LLVM is a beast, though, but to get started I'd want to focus on the my side of the compiler first (the sema is where the pain is imo)

Remember that there's no reason why you can't have an IR in between your language and the backend.

If your language fits, (based on the CPS it probably won't but still) you can use a subset of it as a representation of that IR and use lowering to rewrite expensive constructs into smaller ones, greatly reducing the surface area you have to work with.

hollerith · on Dec 2, 2020

KISS is an acronym for 'keep it simple', which is an surprising choice of word when advocating for the much more complicated choice.

mhh__ · on Dec 2, 2020

Would you rather use a project you can at very least copy and paste from another codebase with, or - for what is basically the easy bit of the compiler - fiddle around with getting a (possibly more suitable backend in the long run, but you don't know that yet) much less mature project working.

ahaferburg · on Dec 3, 2020

Better documented doesn't mean the documentation is perfect. My experience two years ago was: You're going to have questions, and it's going to be difficult to find answers.

LLVM is not KISS. Very, very far from it.

jart · on Dec 2, 2020

Try https://github.com/rui314/chibicc

ahaferburg · on Dec 3, 2020

You have other options. You could design your own bytecode, and write an interpreter. You could emit C, and use a C compiler. If you use LLVM, you also need to decide whether you want to link against LLVM and use the C++ API. Or you can use bindings for another language. Or you can emit IR as plain text, then feed that into the LLVM compiler.

I have done the latter, and would advise against it. The IR is powerful, but quite involved. The documentation is sometimes lacking. It's an interesting project, but you probably won't get far.

Linking against LLVM means you have to use their C++ interface, and every time you rebuild your compiler, it's going to be slow, with all the templates they use. Your compiler will be a large executable (clang and zig are about 90 MB), and you'd have to rebuild that every time you change your compiler. I would hate having to do that in my spare time, which is why I went the other route.

LLVM is very powerful. There's all these optimizations you get, and debug info. But it's not free. The integration is going to be hard and you'll have to spend quite some time on it.

The next time I would start a language project I would build an interpreter in a dynamic language, and possibly even use a parser generator. https://craftinginterpreters.com/ has a nice path laid out. Focus on what you actually want to do. Once you get somewhere, once you actually like your language, once you actually start a project in your own language, and have an actual need for more, you can write another compiler with more bells and whistles.

latenightcoding · on Dec 2, 2020

cranelift is a better alternative.

Rochus · on Dec 2, 2020

rurban · on Dec 2, 2020

Because llvm is unusable for a dynamic language VM. The compile-times are too slow, it's way too big. cranelift is also big, but has sensible staged optimizers. I.e. it doesn't behave like java or llvm.

cesaref · on Dec 2, 2020

We've been JITing using LLVM for a number of years without an obvious problem. It obviously depends on the size of the code you are wanting to JIT, but the OP was discussing a toy language as a side project, and LLVM is certainly perfect for that use case.

Rochus · on Dec 2, 2020

> cranelift is also big, but has sensible staged optimizers

LLVM gives you the freedom to decide what optimizers to use. But it's true it is too big for many applications, even if there are build options for different sizes.

freemint · on Dec 2, 2020

This depends on your usecase i guess. For long running numerical code (Julia) LLVM is fine but needs some work by highly skilled individuals to make it feel like a interpreted REPL when importing big code bases.

jules · on Dec 2, 2020

Consider generating WASM.

MaxBarraclough · on Dec 2, 2020

What was your first compiler?

ashton314 · on Dec 2, 2020

This little lisp to x86:

https://github.com/ashton314/lambda-x86

New one:

https://github.com/ashton314/christmas-compiler