Monads to Machine Code (Part 1)

pka · on Jan 5, 2016

What's cool about monads is that the computation (assembly code in this particular case) is just a plain old value.

So if you have a fragment like this:

    factorial :: Int64 -> X86 ()
    factorial n = do
      mov rcx (I n)
      mov rax (I 1)
      l1 <- label
      mul rcx
      loop l1
      ret

you can just compose this into a bigger program:

    two_factorials :: Int64 -> Int64 -> X86 ()
    two_factorials n1 n2 = do
      factorial n1
      factorial n2

Notice how the labels won't get mixed up. This concept (representing computation by values) is incredibly powerful: one can compose, combine, abstract values into bigger values. For example, compare the compositional properties of RethinkDB's query language to SQL.

anon4 · on Jan 6, 2016

Correct me if I'm wrong, but isn't this basically the same as an HTML DSL, i.e. functions which transform something like (in LISP)

  (html
    (head
      (title "hello"))
    (body
      (p "world")))

into "<html><head><title>hello</title></head><body><p>world</p></body></html>"? Except this takes assembly keywords and outputs machine code.

dmytrish · on Jan 7, 2016

Yes, there is some similarity in expressive power of Lisp macros and Haskell monads (edit: raw Lisp macros are even more powerful, since they may glue and transform anything to anything). However, there is a huge difference in the essence: Lisp macros work on the syntactic level, monads compose on the semantics level.

Lisp macros are not bound by types, so they can be combined irregardless of a program semantics, potentially producing any kind of incorrect code as the result. Such errors get especially tricky with several layers of quasiquotation.

In Haskell, the program meaning is partially encoded into types, so the compiler can understand your intention much better and assists you in programming it right. Also, getting some kind of inter-part optimization is much easier in Haskell: you know exactly what kind of pieces you are gluing up.

dmytrish · on Jan 7, 2016

Yes, and you even can put your whole-block optimizations between the statements being glued (that's the magic of the custom composition primitives, monads).

ch · on Jan 5, 2016

Always fun to see this type of thing implemented in Haskell.

Here are some other examples.

This one uses indexed monads to keep the instruction stream correct:

https://intoverflow.wordpress.com/2010/05/21/announcing-pote...

And this one is very similar to the OP but from 2013, and emits 6502 assembly: http://wall.org/~lewis/2013/10/15/asm-monad.html (of course without the JIT portion).

iheartmemcache · on Jan 6, 2016

A bunch of people on HN were asking where to learn the conventions of x86 ASM - this actually is a pretty good place to start. He picked the 8 important registers, the 3 forms of addressing, demonstrated hi/lo (ah/al) -> ax -> eax -> rax evolution, and kept with the acc, count, data, GP conventions for eax,ecx,edx,ebx which is often overlooked (you can tell which era of x86 asm I grew up in ;)).

I'm scared this has the potential to be information overload for people who aren't familiar with an assembly language and Haskell, but for me I thought this was a terribly neat (both in the "interesting" as well as "structurally well done") article. As usual, fantastic work by Steven. Up there with Bartosz and Yang in quality of write-ups.

rdc12 · on Jan 6, 2016

I don't know Haskell (yet) but do have a small ammount of assembly experience and I managed to follow along fairly well. However on purpose I didn't worry about fully understanding the monadic side of things.

brandonbloom · on Jan 6, 2016

Awesome article!

Seems a reasonably complete assembler too with one notable exception: Forward jumps, conditionals specifically. These may be a tad tricky to fit in to a monad because they require either knowing the future (eg. how far away the jump target is) or rewriting the past (eg. leave a hole and fill it in later).

I'd really like to see both approaches in Haskell, since they're kinda different. My guess is that in the former case, you'd have nested/delimited monads with buffers. However, I don't know Haskell enough to figure out how to accomplish the hole-filling approach.

EDIT: On second thought, I guess both approaches (traditionally "two-pass" vs "one-pass" assemblers) can be accomplished by changing labels to be opaque rather than addresses and storing a symbol table in either JITMem or the two-types that would replace it for a two-pass approach.

thesz · on Jan 6, 2016

Look here: http://www.eliza.ch/doc/wadler92essence_of_FP.pdf (page 11)

The state in the monad in the quoted paper flows backward. The part of assembler that works with label's addresses can use that trick from 1992.

paulajohnson · on Jan 6, 2016

I haven't looked at this one in detail, but MonadFix and the recursive do syntax (https://wiki.haskell.org/MonadFix) will probably solve this problem by allowing forward references.

brandonbloom · on Jan 7, 2016

That'll do for the hole-filling approach :-)

anyfoo · on Jan 5, 2016

You can do even more with Arrows. This very short presentation goes from Monads to Arrows, and explains why: https://www.dropbox.com/s/e3la5wueg1zsid7/Arrows%20for%20CPU...