Implementing a LLVM Micro C compiler in Haskell

winter_blue · on June 4, 2021

This is awesome. This essentially will serve as a more up-to-date tutorial on the LLVM bindings for Haskell. So programming language designers could use this tutorial to build a new language using Haskell. Currently, the state of LLVM documentations, and its tutorial leaves much wanting. This work remedies things on the Haskell side of things by quite a bit.

I especially love the fact that instead of implementing a novel language like Kaleidoscope (which the official LLVM does), it implements a subset of C (a language that's low-level enough and widely understood). C has all of the basic constructs, which if you understand how to write a compiler (to LLVM) for, would allow you to implement far more advanced languages.

Thank you Joseph Morag for this wok, and Théophile Choutri, Moritz Kiefer, et al for making it possible.

pjmlp · on June 4, 2021

Kaleidoscope seems to only focus on C++ nowadays, the OCaml version is no longer there.

https://llvm.org/docs/tutorial/

kubb · on June 4, 2021

This reminds me of a project that I did in my university. It was a compiler of my programming language similar to C straight to x86 assembly. It had typechecking, basic primitive types, local and global variables, arrays and function calls. My professor recommended Haskell saying that it'll make the job easier for me, and he was right. The thing basically worked as soon as I got it to compile. I remember that another student used Java for the project, and he ended up with a 10x larger codebase, riddled with bugs.

The code is still on my github. I recently took a look at it and I was surprised how readable it was. It's a shame that in my professional career I didn't get the chance to use neither Haskell nor any of the PL skills that I picked up in uni, because I really had fun with that project. Though maybe that's for the best. If you have to do it for work, you sometimes end up hating it.

siraben · on June 5, 2021

This is amazing. I tried following Stephen Diehl's JIT compiler in LLVM tutorial[0] a few years ago but it was already outdated (the llvm-hs library changed quite a bit), and subsequent web searches didn't turn up much.

For those interested in tutorials like this, I'd also recommend a very literate Haskell compiler for the PCF language to C[1], which is essentially lambda calculus with some primitives and pattern matching. It details a number of transformations such as closure conversion and lambda lifting.

[0] https://www.stephendiehl.com/llvm/

[1] https://github.com/jozefg/pcf/

HexDecOctBin · on June 5, 2021

Are there any similar tutorials written for the LLDB bindings? Since the only LLDB documentation I could find was auto-generated doxygen with no usage code.

helltone · on June 4, 2021

This looks great, are there any similar tutorials written for languages other than Haskell?

andi999 · on June 4, 2021

Why not go for the full C language. C was made to make compilers easy to build. Also it might expose/help to see difficulties in the chosen approach.

ash_gti · on June 4, 2021

They went with a subset of C, its enough to write a working executable. Generally for a introduction tutorial to something as complex as writing a compiler a subset like what they presented in the article is enough to show how all the pieces work together to produce a working compiler.

mikepurvis · on June 4, 2021

Yup, for the purposes of a demo like this, a simplified top-to-bottom "vertical slice" is of far more value than total feature coverage.

pjmlp · on June 4, 2021

Unless you are talking about something like "A Retargetable C Compiler: Design and Implementation", it is definitly not easy.

https://www.amazon.com/Retargetable-Compiler-Design-Implemen...

During the early 80's, the best home computers could get was Small-C.

https://en.wikipedia.org/wiki/Small-C

randomifcpfan · on June 5, 2021

Small-C was the best one with open source, but there were many others. This one was my favorite: https://www.bdsoft.com/resources/bdsc.html

pjmlp · on June 5, 2021

What I got back in the day came in book form, it was also another flavour.

"A book on C"

https://link.springer.com/book/10.1007/978-1-349-10233-4

It uses a K&R C subset with bytecodes and a bytecode-> machine language translation.

I just never bothered to type it in though.

reikonomusha · on June 4, 2021

A conforming ISO C compiler seems extremely difficulty to build and I don’t think ease of writing a compiler is a part of the design these days.

andi999 · on June 4, 2021

They could stick to ANSI C.

woodruffw · on June 4, 2021

ANSI C and ISO C are the same thing, at least for the 1989/90 revision. It’s still not a particularly easy language to implement, at any stage.