TenDRA is designed for front-ends to compile to ANDF, an "architecture-neutral distribution format" which is kind of like LLVM bitcode but different. Sadly it never really gained any lasting momentum as an open source project, though it currently has a few active maintainers.
LLVM is much much much more than just the IR, e.g. LLVM's scheduling code is sufficiently abstract and compartmentalised that it can be used to (optimistically, last time I checked) predict performance of machine code not generated by LLVM (llvm-mca)
That's entirely possible, but does it hurt their bottom line? They switched from GCC, because LLVM was available (to grow and eventually use).
Their aversion to the GPL seems mostly cultural, which is interesting given that we skewer microsoft for their sins along those lines 20 years ago now.
Apple has an aversion to GPL 3 and not to GPL 2.x because of what 3 forces people to do. It's the same with FreeBSD: once libgcc_s went to 3 they could no longer use it.
It's the same reason why bash and rsync have not been upgraded on macOS to newer versions: they also started using GPL3.
Honestly curious what's the thing that "[GPLv]3 forces people to do" in context of those cases? IANAL but interested in perspective of choosing license for some of my current and future works
Not lock down people's devices basically. [L]GPLv3 requires that people are able to replace the [L]GPLv3'd binaries in their own devices if they want to (so they actually take advantage of the code being FLOSS instead of some nice theoretical novelty - ie. it is nice that my router uses Linux and find it amusing that it comes with the GPLv2 printed out, but what is the point if i can't open it up and fix/change some things that annoy me?). Apple wants to keep the devices they sell you under tight control, so [L]GPLv3 rubs them wrong.
It might also be about patents since IIRC [L]GPLv3 requires that if you use a [L]GPLv3'd program and that program infringes some of your patents then you are not allowed to sue others over their use of your patents in the same program.
GCC would not have taken it anyway, they tried to prevent parts of GCC getting used independently or as a library to keep the lock-in to GNU, while the architecture of LLVM is of embeddable reusable libraries.
I don't understand the bits about RMS' email configuration, but in that message in 2015 he indicated that he did not know about the offer in 2005 to give copyright of LLVM to the FSF.
That table doesn't have greens for apple clang that are missing for regular clang. As far as I know they take some time integrating upstream llvm changes, so the two columns are due to that.
Historically they've been very good about upstreaming. Sometimes it takes a while, but so far everything made it to the public repo.
Definitely got sympathy for that position. Upstreaming changes often takes a long time to get across context, get community buy in to changes, get approval cut a release, whereas internally, it's often just a code review, which can be prioritised according to business needs.
It's quite common for long standing forks to exist, where changes get pushed upstream from the fork, and the fork gets periodically re-cut.
> Apple doesn’t upstream of their changes already, hence why you have a specific Apple column
It looks to me like it’s the exact opposite, based solely on your link that is. Based on your link, they lag _behind_ upstream (vs being _ahead_ of upstream as you indicated) and don’t offer the same features, without a delay anyway, in “Apple Clang” that are in “Clang”.
Hence why I said it was a sign of the future rather than a prediction. It's not going to get any better for software freedom, especially as we move into the cloud.
We live in a post-FSF world, I think people are taking for granted the bean counters playing nice with open source software.
We need software that works on the client side sans internet connection. Devices are really powerful these days so it’s funny things move to the cloud. It’s funny I’ve been using cloud storage for the past several years but I wonder why I don’t just back up to external drive or two. The convenience doesn’t buy much. I think folks really buy into laziness and technology hype. But fundamentally not much had changed in the software space compared to 20 years ago.
I still hold out hope that things will get better in the future. Eventually Moore’s law will truly end. We will have to stop once we run up against hard physical limitations.
Why is this a reason for optimism? Because it shifts the outlet for innovation elsewhere. Costs of manufacturing will drop as competition in the fab space increases. I think RISC-V is a sign of things to come. There is great potential for open and amateur hardware.
I wanted to write an LLVM backend for various instruction sets (e.g. Z80, R216[0]), since those would deal with the problems of optimization passes and register allocation for me, but the LLVM tutorial[1] makes it look so goddamn hard. Does anyone know of a tutorial or of a declarative way to write such backends for either QBE or LLVM? The QBE git repository[2] has a few backends but also looks similarly involved.
Comparatively, writing frontends is quite easy these days and many examples of compiling to IR exist.
Writing backends to integration into compiler stacks like LLVM is as challenge unless one is quite deep into compilers.
If the idea is just to learn how they work, it is easier to just dump an IR that is compatible with a macro assembler, then just call the assembler with your macro definitions on the generated IR.
It won't win prices in performance, but one gets to learn how things work and will have a compiler with proper binaries at the end, and one can fine tune the macros to improve the quality of the code anyway.
Writing compiler backend in a declarative manner is something I've been toying with for a while now, and I don't think it's realistically possible with current technology.
Backends are hard, unfortunately.
However, since you are targeting the Z80, you might be able to just take the IR and transform it yourself? I haven't touched an LLVM backend for a while, so I don't know what the process is atm, but if you imagine that a lot of a modern compiler is dealing with instruction scheduling and the like , which the Z80 doesn't really have a need for.
LLVM is very good for the industry, but it is a little over-engineered(?). Maybe that's a bit harsh, but the difficulty is definitely thrown at you earlier than hacking on most compilers.
It is not purely declarative, but Fraser & Hanson's lcc comes close while still delivering good optimization -- and has a book describing it (IIRC it was written literate-programming style).
It's been 20 years or more, I played with an older version that required an Icon interpreter as part of the build chain, and I think they had a later one that did away with that requirement.
The code generator is basically built from a declarative description of tree matching/rwriting templates with a cost model, and the compiler will find the optimal cost match.
> Writing compiler backend in a declarative manner is something I've been toying with for a while now, and I don't think it's realistically possible with current technology.
Functional style isn't too bad. Maximal munch lines up with pattern matching, then linear scan is easy enough if you don't need gcc-tier results.
GHC has bits of its backend that are shared across architectures, maybe I should have a look at that...
It's certainly not trivial, but still an interesting journey. I made a backend for MMIX, heavily influenced by the risc-v implementation, and kept the patches logically consistent by rebasing new changes to serve as an example. The last rebase was a year ago, but I think following a previous implementation is the best way forward as there is so much boiler-plate:
- https://github.com/stuij/mmix-llvm-support
- https://github.com/stuij/mmix-llvm-backend
That said, those kind of olden CPU's aren't a terribly snug fit for LLVM, due to their peculiarities. You'll have to bend LLVM quite a bit to bend it to your will, which also means that you need to get quite familiar with it as well.
I’m going to be building a compiler over Christmas break, and I was thinking of targeting the LLVM toolset. This will be my second compiler; first one emitted x86 directly. Should I consider switching to QBE? I’m more focused on compiler passes that I want to write; register allocation, etc. is not my thing. I’m going to be working with CPS-conversion ans stuff like that. Does anyone have any good tutorials for using either system?
Sounds like you want to compile a functional language. The best tutorial (in terms of being self-contained and being understandable in entirety) I've seen to compile FP to C is[0], I'd imagine that retargeting to LLVM wouldn't be that difficult either.
That’s a great idea honestly. Even Rust would make a good compile target, the strong typing could help debug the compiler as you go. Although LLVM assembly is a good choice too!
Are there any big downsides to compiling to C instead of LLVM? I would assume that it would be less performant but it feels like a shallow preconception.
It's somewhat more constricting, certain types of compiler magic are hard to reproduce in plain C. For instance, dealing with things that need to be stack-aware, such as a GC or tail recursion, takes epic hacks like the Boehm GC or the "Cheney on the MTA" trick. You're also stuck with the C calling convention.
Plus, while generating C is simple, you're entirely on your own. With LLVM IR, you benefit from its infrastructure, the optimization passes and other LLVM-based tools, so you don't need to reinvent the wheel as much. In the end, you'll need some kind of IR anyway, and LLVM is a good place to start.
> Are there any big downsides to compiling to C instead of LLVM?
Well, for starters, you have to import all of C's undefined behavior, so it would be basically impossible to implement, e.g., 2's complement signed integer arithmetic.
Strictly speaking it isn't impossible, just much slower since you'd have to implement on software what the hardware does.
In practice though pretty much every compiler that uses that as an optimization has a flag to disable it and chances are you'll be needing to use other system and/or compiler specific stuff anyway.
LLVM is much better documented than any other compiler system, so I would just KISS and use it. LLVM is a beast, though, but to get started I'd want to focus on the my side of the compiler first (the sema is where the pain is imo)
Remember that there's no reason why you can't have an IR in between your language and the backend.
If your language fits, (based on the CPS it probably won't but still) you can use a subset of it as a representation of that IR and use lowering to rewrite expensive constructs into smaller ones, greatly reducing the surface area you have to work with.
Would you rather use a project you can at very least copy and paste from another codebase with, or - for what is basically the easy bit of the compiler - fiddle around with getting a (possibly more suitable backend in the long run, but you don't know that yet) much less mature project working.
Better documented doesn't mean the documentation is perfect. My experience two years ago was: You're going to have questions, and it's going to be difficult to find answers.
You have other options. You could design your own bytecode, and write an interpreter. You could emit C, and use a C compiler. If you use LLVM, you also need to decide whether you want to link against LLVM and use the C++ API. Or you can use bindings for another language. Or you can emit IR as plain text, then feed that into the LLVM compiler.
I have done the latter, and would advise against it. The IR is powerful, but quite involved. The documentation is sometimes lacking. It's an interesting project, but you probably won't get far.
Linking against LLVM means you have to use their C++ interface, and every time you rebuild your compiler, it's going to be slow, with all the templates they use. Your compiler will be a large executable (clang and zig are about 90 MB), and you'd have to rebuild that every time you change your compiler. I would hate having to do that in my spare time, which is why I went the other route.
LLVM is very powerful. There's all these optimizations you get, and debug info. But it's not free. The integration is going to be hard and you'll have to spend quite some time on it.
The next time I would start a language project I would build an interpreter in a dynamic language, and possibly even use a parser generator. https://craftinginterpreters.com/ has a nice path laid out. Focus on what you actually want to do. Once you get somewhere, once you actually like your language, once you actually start a project in your own language, and have an actual need for more, you can write another compiler with more bells and whistles.
Because llvm is unusable for a dynamic language VM. The compile-times are too slow, it's way too big.
cranelift is also big, but has sensible staged optimizers. I.e. it doesn't behave like java or llvm.
We've been JITing using LLVM for a number of years without an obvious problem. It obviously depends on the size of the code you are wanting to JIT, but the OP was discussing a toy language as a side project, and LLVM is certainly perfect for that use case.
> cranelift is also big, but has sensible staged optimizers
LLVM gives you the freedom to decide what optimizers to use. But it's true it is too big for many applications, even if there are build options for different sizes.
This depends on your usecase i guess.
For long running numerical code (Julia) LLVM is fine but needs some work by highly skilled individuals to make it feel like a interpreted REPL when importing big code bases.
I wish something like this existed for garbage-collected languages. The closest thing I know of is the JVM (and other similar VMs), but I'm hoping for ahead-of-time compilation.
The key difficulty in adding garbage collection to code emitted with tools like this is being able to accurately tell what the root set is.
(I know LLVM has support for this in theory, but I have found nothing that tells how to actually use it)
There are LLVM backed languages that provide garbage collection; Julia and Crystal are the first two that come to mind. Haskell also has an LLVM back-end.
I thought this wasn't universally available (that it was only in a special version of the JDK - https://openjdk.java.net/projects/graal/). But it appears to be everywhere, albeit experimentally.
Elaborating: Until recently, your best option for running CIL in an AOT type environment would be Mono AOT, which has it's own limitations.
Nowadays, however, we do have CoreRT in a state that lets you do AOT, still with limitations around things like reflection. But, it's fairly doable nowadays, I would think.
defeat the purpose of using SSA in the first place? If this form is allowed, then I can not rely on `%v` having an immutable value which is much of the reason why we use SSA in the first place.
I think LLVM's IR document is not easy to read, this one is not better.
I don't want to reimplement a compiler for the IR language, I want to know how to use it, how to generate code using it. I need to see examples, i.e. how to implement threading, coroutine, weakly typed variables, reflection ... not the BNF.
I am also working on a compiler and need to decide on a backend soon. I currently plan to use C (clang) as IL in the first iteration, with the option to upgrade to LLVM IL later.
I wonder whether there is any easier alternative. I definitely need to start with a framework that takes text input, because everything else would be hard to integrate with the TypeScript-based interpreter of my language. And it needs to work with a runtime written in C. QBE looks nicer than LLVM's IL, but I wonder whether it would cause more trouble in the end.
Since this version of your language will never be used by thousands of users (if you ever hit sucess you’ll want to rewrite your compiler anyway) QBE is a lot simpler to use than LLVM. The main limitation is the limited set of target architecture, and probably the dlow development. In effect Linux AMD64 and ARM64 are the only well supported architectures. Also you may be able to compile to windows using mingw assembler.
Sure, but the idea is to avoid C and compile to a simplier IR, like qbe. And then compile that to native code in 1/10 of the time and memory LLVM or GCC do.
> it is programmed in non-fancy C99 without any dependencies.
These are contradictions. C99 is not a "very hackable" language. If the compiler isn't designed to make fast binaries then they should have written it in a higher level language that has less foot guns.
Notable mentions,
IBM's research project on PL.8 while developing their first RISC designs.
https://rsim.cs.uiuc.edu/arch/qual_papers/compilers/auslande...
https://pdfs.semanticscholar.org/3288/fc042cd474f0ec93d67753...
https://rishiheerasing.net/modules/hca2102/paper/cocke.pdf
The Amsterdam Compiler Toolkit,
http://tack.sourceforge.net/
https://github.com/davidgiven/ack
https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.12...
The more the merrier.