Demystifying MLsub – The Simple Essence of Algebraic Subtyping

siraben · on July 26, 2020

While some of the formal notions of type theory can seem daunting, I personally have found it extraordinarily useful in being able to understand the different expressiveness of type systems in programming languages. There are languages that clearly implement say, parametric polymorphism well (ML-style languages) and safely compared to far more ad-hoc approaches, such as C++.

At the same time it's very cool to be able to grasp more advanced type systems (e.g. dependent types) that haven't been popularized yet and imagine how they could lead to radically different methods to software engineering.

I like Benjamin Pierce's Types and Programming Languages book, it's mathematically rigorous but well motivated and structured for CS-oriented readers. It covers everything from pure lambda calculus to mutable objects, Featherweight Java, and type systems with subtyping.

shpongled · on July 26, 2020

Types and Programming Languages is one of the best textbooks I've ever read, even as a non-CS-oriented reader. I was able to start at the very beginning of the book with no solid grasp of the lambda calculus or type theory (or formal CS education), and I worked through the whole thing, implementing many of the different types systems in Rust[0]. TAPL does an excellent job of building on each proceeding chapter and explaining things in a very accessible manner. TAPL actually inspired me to start working on a Standard ML compiler and language server as well. [1]

[0] https://github.com/lazear/types-and-programming-languages [1] https://github.com/SomewhatML/sml-analyzer

Ericson2314 · on July 26, 2020

This is quite the gauntlet throw! And to be clear, this corresponds to a published paper by the same author.

I'm still reading this, in MLSub thesis Stephen Dolan argued that all the traditional attempts were mathematically illerate syntax mashing, and one has to get at the core of the semantics first. This paper seems to be saying all the math Stephen Dolan did was self-important overkill, at least in hindsight.

Very odd that the new algorithm has been studied empirically but doesn't have a proof. Whether or not it is actually correct, the theory implications are fascinating and need further study.

DalekBaldwin · on July 26, 2020

"This paper seems to be saying all the math Stephen Dolan did was self-important overkill, at least in hindsight."

It shows that you can implement algorithms that give the same results as MLsub without following the algebra so closely, but those algorithms were still designed with MLsub as a reference. It would be difficult to just stumble on a system like Simple-sub without that guidance.

And Dolan's emphasis on the algebra of types still provides a guiding perspective for future work. If you want to add subtyping to a language with a more sophisticated type system, or add new features to a language that already has subtyping -- even if the existing subtyping system was written using MLsub or Simple-sub as a starting point -- you should probably take a long hard look at it from an algebraic point of view to make sure they interact the way you want, rather than call it a day after finding the simplest way to add the feature while still yielding a type lattice.

lptk · on July 27, 2020

> It would be difficult to just stumble on a system like Simple-sub without that guidance

Actually, I'm not sure it would be that hard (Simple-sub author here). If you look at the core of the algorithm closely, you'll see it's really not special: you create type variables, add constraints on them, propagate the constraints, and that's mostly it. In fact, it's perfectly in line all previous work on subtyping inference, going back at least to Pottier's 1998 thesis — that thesis already had many ingredients of MLsub. The three big things MLsub brought to the table, in my opinion, were: 1. the idea to coalesce constraint graphs into compact type expressions, which are easier to read and understand; 2. using the algebraic insight to develop an elegant and simple principality proof (but I'm not sure it's really worth it; AFAIK the proof sketch I provided would also work); and 3. to provide a simpler subsumption checking algorithm. Specifically in terms of pure type inference (so, ignoring points 2 and 3), it's possible to imagine making the leap to compact types without the algebra insight. But this is just speculation, and it's of course impossible to tell a posteriori.

> If you want to add subtyping to a language with a more sophisticated type system, or add new features to a language that already has subtyping

In that case I think Simple-sub is a good place to start, as it does not presuppose all the algebraic structure that MLsub does; Simple-sub shows that principal type inference really doesn't need much at all form the type system. Now, you're free to add more structure to your subtyping lattice if you want to be able to simplify types more, and if you want to make checking subsumption easier, but the choice is in your hands.

> you should probably take a long hard look at it from an algebraic point of view to make sure they interact the way you want

That's still definitely true!

Ericson2314 · on July 26, 2020

Yes I am a huge believer in the algebraic view.

That's true even for implementations; I want to see the general ideas implemented as libraries so the implementation isn't an monolith mashing up ideas from various of many papers, but a composition of libraries.

Yes, there is a lot of rich stuff in the MLSub thesis which is hard to follow, but the functional perl approach here I also find confusing so there is no attempt to distill the essence of what's going on.

Towards the bottom of the accompanying paper (my PDF viewer lags less than my browser!) there is a little bit of theory discussion, but I would like to see more. It would be very good if someone could take some of the tricks here and present them more abstractly.

lptk · on July 27, 2020

> the functional perl approach here I also find confusing so there is no attempt to distill the essence of what's going on

Sorry to hear that! I guess it depends on your preferences. Personally, I have a pretty "operational" mindset, so I'll find an algorithm easier to understand than its specification in terms of abstract algebra.

The whole thing started when I tried reimplementing MLsub at an MSR hackathon, and found it unnecessarily difficult — all the operational insights had to be painstakingly extracted from the thesis and barely-documented OCaml implementation (like others, I found the paper insufficient to reimplement the approach satisfyingly, for instance see http://gallium.inria.fr/blog/safely-typing-algebraic-effects...). I wrote this paper so other type system implementers wouldn't have to go through it again!

lptk · on July 27, 2020

Hi, author here. Just wanted to say that you should read the paper rather than the blog post, as the paper is more recent. It's in open access (and CC-BY license) here: https://infoscience.epfl.ch/record/278576