Author here. Yes, the scheme requires whole-program compilation, which is unfort...

anfelor · on Feb 26, 2023

Thanks for confirming, and that is a good point! However, I think that for monomorphizing compilers, programmers often write code in such a way as to avoid triggering a larger rec-compilation. With monomorphization this might be easier to achieve since you need to control how your datastructures are instantiated. But with functions you would need to control how the functions are passed. For example, a use of a Haskell-style `lens` package would probably trigger a re-compilation of `lens -> kan-extensions -> profunctors -> comonad -> base`. Programmers can work around that, but it may place restrictions on what they can easily do with the language.

fourteenminutes · on Feb 26, 2023

Yes - that’s a great observation. The scheme here means that you can end up with compilation dependencies that aren’t reflected in your explicit dependency graph, exactly e.g. via the path you describe.

The same is true of monomorphization in presence of typeclasses (or traits, concepts, etc). I have some ideas about how to do such incremental compilations optimally, but they haven’t been written down.

Anyway, I totally agree with you. I don’t mean to suggest this is the best way to do things - if anything Rust/C++/etc have taught us excessive monomorphization is probably not the way to go for developer experience reasons. You may be able to imagine some interesting derivative of the scheme presented here with something like Swift’s “witness table”-based compilation of protocols, which may be much more compile-time performant, and support separate compilation. But, I don’t even have a sketch of that. This is only one technique and the design space is very wide.

anfelor · on Feb 26, 2023

Another question: In the section on eliminating heap-allocated captures, you discuss creating a datatype for captures that is passed by-value. But what if that datatype is recursive? For example, how would you compile a CPS-transformed `reverse`:

  fun reverse(xs : list<a>, k : list<a> -> list<a>)
    match xs
      Cons(x, xx) -> reverse(xx, \ys -> k(Cons(x, ys)))
      Nil -> k(Nil)

Here, the `k` that is recursively passed depends on the previous `k`. As such `k` should be as large as the input list and can not be stack-allocated?

fourteenminutes · on Feb 26, 2023

Yes - if the closure representation is recursive, it must be boxed. I believe your example is also that given by the original author (William Brandon et.al.). I don’t think there is any way around this, because it is the same as any other recursive data type at that point.

Although, if you know at compile time the size of the list, you could imagine compiling the recursive data type “unrolled” into a statically-sized array - but now you need dependent types.

anfelor · on Feb 26, 2023

Ah thanks! I didn't find that paper, is it online somewhere?

You may also be interested in the stack allocation system in OCaml: https://www.youtube.com/watch?v=yGRn5ZIbEW8 In particular, Stephen gives a nice example in the end how it can avoid some allocations of lambdas (but it also would need to box the closure in the CPS-reverse).

You may also be interested in

  Jeffrey M. Bell, Françoise Bellegarde, and James Hook. “Type-Driven Defunctionalization,” ICFP ’97, . Association for Computing Machinery, New York, NY, USA, 25–37. 1997. doi:10.1145/258948.258953.

and

  Andrew Tolmach, and Dino P Oliva. “From ML to Ada: Strongly-Typed Language Interoperability via Source Translation.” Journal of Functional Programming 8 (4). Cambridge University Press: 367–412. 1998.

who thought about whole-program-defunctionalization before.

fourteenminutes · on Feb 26, 2023

The manuscript is currently unpublished so it’s not available ):

Thanks for the links - I’ve seen them, and it’s interesting that defunctionalization is usually an optimization. The “Type-driven …” paper is the earliest work I’m aware of that does it over surface types, but doesnt “go all the way”.

Id love to chat more - a link to email can be found around the post, including at the bottom.

girvo · on Feb 26, 2023

As someone who lives in embedded land, but wishes we had better concurrency and approaches (as embedded gains more and more cores and performance while still remaining low power, and sequencing tasks and race-to-idle becomes even more important), whole-program compilation is fine for me haha. We already have to do that for the most part anyway. Awesome ideas, I'm definitely digging in deeper this evening! Heap-less directly called closures would be amazing.