Hacker News new | past | comments | ask | show | jobs | submit login
How the OCaml type checker works (2022) (okmij.org)
219 points by mooreds 27 days ago | hide | past | favorite | 42 comments



Before opening the article, my first thought was "wdym, it's HM and Algorithm W"

The very first sentence was: > There is more to Hindley-Milner type inference than the Algorithm W.

I guess congratulations to the author for knowing the audience well enough


That remark is actually more interesting than you think. As groundbreaking as it was, algorithm W iss far too slow for non-toy languages. All modern HM languages (that I know of) use some form of union-find trickeries, as pioneered by the one presented in the blog post (but also present in resolution-by-constraints approaches employed by Haskell and Scala).

So, in fact, it's actually never algorithm W in non-toy languages. ;)

Side note: this article is originally from 2013 and is considered a must-read by any would-be hackers trying to modify the OCaml typechecker (it's cited in the documentation).


In fact, those union-find trickeries come from the same paper that presented algorithm W, where they were named algorithm J. W was known from the start to be more useful for proofs than implementation:

> As it stands, W is hardly an efficient algorithm; substitutions are applied too often. It was formulated to aid the proof of soundness. We now present a simpler algorithm J which simulates W in a precise sense.

https://doi.org/10.1016/0022-0000(78)90014-4


Isn't the union-find trickery "just" standard almost-linear unification as discovered by Paterson-Wegman and Martelli-Montanari around 1976? Or is there more to it in the setting of Algorithm J?

I'm actually a bit surprised that it took so long to discover these formulations of unification. I wonder what Prolog systems were doing at the time, given the importance of efficient unification.


Paterson-Wegman and Martelli-Montanari are worst-case linear, but Algorithm J just uses the earlier almost-linear union-find approach: unification variables are represented as mutable pointers to either a) nothing, or b) another type (which may itself be the "parent" unification variable in the union-find structure). Unification is then just a recursive function over a pair of types that updates these pointers as appropriate- and often doesn't even bother with e.g. path compression.


The Wikipedia articles claims that W is efficient, but only for a core language without highly desirable features like recursion, polymorphism, and subtyping.

https://en.m.wikipedia.org/wiki/Hindley%E2%80%93Milner_type_...


OCaml without recursion would not be the same.


The subtitle, "or what polymorphism and garbage collection have in common", is another hint there may be more to TFA than its HN submission title indicates.


> OCaml generalization is based on tracking of so-called levels of a type.

The level tracking reminds me of a recent paper exploring SimpleSub[0], a simpler alternative of adding subtyping to ML-style type systems. It also gets rid of the algorithm W's repeated generalization (introducing foralls and turning a type into a polytype) and instantiation (changing the universally quantified type variables to fresh type variables). They have slightly different operations on levels, e.g. extrude. I wonder if this level tracking is independently invented again.

[0]: https://dl.acm.org/doi/pdf/10.1145/3409006


Not independently invented again. Lionel is fully aware of the level tracking in the OCaml type-checker; in fact the OP is cited in the Simple-sub paper (section 3.5.1, page 15).


Say whatever you want, but this is how websites should be designed.


Advocando pro diabolō, I disagree. Since the width just expands to the size of my screen, and I have a rather wide screen, I found this design (or lack thereof) sufficiently obnoxious that I added a touch of custom CSS.

Setting a static max-width, increasing margins, and playing with the font made the page much more readable for me.


Beginner question: where to start to understand how to design a type system?


Types & Programming Languages

https://www.cis.upenn.edu/~bcpierce/tapl/


OT: how come there are so many OCaml posts recently? Genuinely curious!


Some popular streamers have dabbled in OCaml this year, sometimes calling it "the Go of functional programming", which probably set off a small wave of people tinkering with the language. OCaml has also gotten gradually better in recent years in terms of tooling, documentation, standard library, etc.


I think they were saying that Gleam was Go of functional programming? OCaml may be like Go compared to Haskell but IMHO Gleam really embraces simplicity and pragmatism.


I would say some other reasons OCaml is similar to Go is that the runtime is very simple, performance is on par and the compilation times are very fast. It also markets itself as a GC'd systems language similar to Go. I think a seasoned OCaml would be able to guess the generated assembler code.

I suspect that Gleam is quite different in that regard.


Thankfully it has a modern type system, though.

As for the GC systems language, there is even a book about it,

https://ocaml.github.io/ocamlunix/


The "Go of functional languages" title should go to Haskell.

The Haskell's STM and channels implemented in it allow for most (or all) of the Go "select" statement, but in a library, not language.


Go prioritizes simplicity and pragmatism which is much more like OCaml than Haskell.

Haskell is more like a Rust of FP. But Rust is also much more pragmatic than Haskell.


Anecdotally, I feel like OCaml is growing in popularity, probably due to ecosystem improvements. Stuff like dune and other OCaml Platform tools becoming mature, multicore support, recently first-class Windows support, etc.


I also suspect that people are more open to a language like OCaml. With Rust and javascript being so popular, a lot of constructs in OCaml will not seem so foreign.

OCaml is in many ways a sane Typescript or a functional version of Go.


Ditto, it feels like more people are coming around to the ML style type systems. I'm hoping Gleam will fill the void with a scalable BEAM backend and compiling to JS with Lustre on the frontend (or even just serverside with htmx).


Having known OCaml since Caml Light days, Go's type system has nothing to do with OCaml, it is exactly the kind of languages the Go community rants about.


Yeah I don't think the Go community would like OCaml very much. But I do think there are some similarities, in that they both have great compile times, as a functional language OCaml is quite simple, they are both GC'd systems languages and both have predictable runtimes. So if you appreciate a lot of ideas in Go, but find them too conservative and the error handling too tedious, OCaml might be a good choice.


OCAML on Windows was such a pain several years back. It's good to hear that they've improved on it.


OCaml is the one language I’ve used whose standard library is great to read. It’s a very developer-friendly language but in all of the ways that make it popular on HN and rarely used in the real world.


Before OCaml multi core it was a non starter for many applications. Now, OCaml can be used for almost anything!


I really rooting for the Riot framework. It's based on the actor model and makes using multi core in OCaml a breeze.


Well its feature list seems positively delightful: https://github.com/riot-ml/riot !

Basically, it seems, it's Erlang for OCaml. Hot reloading would be a cool feature, though, but I can see why it's not implemented, at least not yet.. I recall the OCaml native toplevel is able to load code in dynamically, so that could be the mechanism to do it.

It seems to use open types for handling messages (per just looking at https://github.com/riot-ml/riot/tree/main/examples/3-message...) reducing the benefits of exhaustiveness checking, but it still seems rather interesting!


I don't know if there's any reason like a project, derived language, or some new feature. But there are always posts and comments about OCalm at https://old.reddit.com/r/functionalprogramming/. So it's not really a surprise to me that it gets some waves of posts every now and then.


It feels like we're (IT, hackers, et. al.) finally growing up.


A complete novice question: I think I remember reading that Rust is moving from one of the standard type checking algorithms (this one?) to general purpose Z3 SMT for speed. Does type checking happen to have the same complexity as SMT? Or Z3 is just so insanely well optimized with heuristics and all that it happens to give better overall performance then problem specific checking algorithms with better theoretical performance (eg this one)?


I think you remember wrong. Rust is moving towards using a datalog engine, but for lifetime resolution (the project is called Polonius), not for type checking. Datalog engines have some similarities with SMT solvers so this might be what you're thinking of.


The Polonius rules were formulated using Datalog, but the implementation that will ship in rustc does not use Datalog: https://blog.rust-lang.org/inside-rust/2023/10/06/polonius-u...


The implementation based on the datalog engine was also found to be generally too slow and was replaced with an ad-hoc dataflow algorithm.


Having built a type checker with Z3 in the past, the simple answer to “does type checking happen to have the same complexity as SMT?” is no. That’s because the “T” in SMT, “theories” can be pretty much anything - they’re essentially plugins.

A more nuanced answer is that many problems are reducible to SAT, meaning that the answer can technically be yes, but a type checker that simply prints the message “UNSAT” upon failure isn’t very useful!


Unification is at the heart of type checking and unification is doable with a SMT solver. So a SMT solver can be a type checker if you want which should show that SMT is indeed much more complex than type checking.


Title should say (2013) not (2022)


Is says "Last updated January 9, 2022" at the bottom of the page. But I guess the article is about the type checker version of February 2013?


This was updated in 2022 but was indeed written and originally published in 2013.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: