Hacker News new | past | comments | ask | show | jobs | submit login
Type system updates: moving from research into development (elixir-lang.org)
282 points by weatherlight on July 4, 2023 | hide | past | favorite | 75 comments



I've used Elixir professionally for 6 years and have been very impressed with how it's been managed. All additions seem very careful and deliberate. I can't think of any regression or issue I've had as the language and tooling and standard library have improved.

Given that, I'm excited to see how this works out. Adding static types has the potential to be very disruptive but I trust the team to do it in a way that will work very well with existing Elixir apps and the BEAM VM.


I'm interested to see how this affects Gleam.


I don't think it will. Elixir, if everything pans out, will be gradually typed. the new type system will be sound, but will be very different than other type systems.

Gleam from Day 1 is going to be statically typed using a Hindley-Milner type system/Algorithm W. It's battle tested. If you come from a lang like Rust, OCaml, Standard ML, etc its type system is going to be very familiar to you.


Additionally, Gleam is very different as a language. Erlang is only one of its targets, another one is JavaScript (and I see many people just using it for that alone). I'm personally writing a static blog generator for myself that'll run on the Node target.

So I don't think this will have too much effect on Gleam.


As a very frequent nagger about the lack of type system of Elixir, this made my day:

> The type system will extract type information from patterns and guards to find the most obvious mistakes, such as typos in field names or type mismatches from attempting to add an integer to a string, without introducing any user-facing changes to the language.

> By propagating types from structs and their fields throughout the program, we will increase the type system’s ability to find errors while further straining our type system implementation.

Congrats to them.

I still wish it had been implemented 10 years ago, but, hey, that train has sailed a long while ago. (I know, trains dont sail. Or, as we say in Elixir-land, "** (MatchError) ...")


Elixir got traction as a Ruby on Rails replacement in combination with Phoenix. That was deliberate. The core modules and functions got names identical or close to their Ruby counterparts. It makes sense that they aimed a dynamically typed language to another dynamically typed language. If they aimed at Java developers they would have to add a type system since the beginning. That's extra work though.

I got into Elixir because it is dynamically typed. I wouldn't have considered a statically typed one. Luckily I find enough customers with Ruby and Python. The last Elixir project was from a couple of years ago. I didn't have to use something statically typed since 2012, I think. Too tiring to write.


Folks tend to understate the influence of Erlang and overstate the influence of Ruby on Elixir. In a nutshell: Elixir is dynamic because Erlang is dynamic. :)

Ruby did influence the syntax and the names in the standard library, but the latter was also done in a more "democratic" fashion: I would look into Ruby, JavaScript, Clojure, Haskell, and choose a name that was common and closer reflected the semantics that would fit Elixir (for example, the term "protocol" come from Clojure).

The most curious case is the "Regex" module (which is named Regexp in Ruby). I quickly noticed that both "regex" and "regexp" were very common, Erlang simply used "re", so I had no clear direction to lean towards. The decision came by getting the top 50 languages at the time and using Rosetta Code to pick the most common term (which was regex).


Proposed type system described in more detail here. https://www.irif.fr/_media/users/gduboc/elixir-types.pdf


I seemed to remember one of the authors from some previous work on formalizing multimethod dispatch, so I looked him up again..

He seems to have written a ton of interesting stuff on PL theory!

https://www.irif.fr/~gc/papers.en.html


Giuseppe Castagna is a very well known PL researcher, and very seasoned when it comes to gradual types and "structural" types (the static version of duck-typing), which seems like a great fit for Elixir.


Ah, thanks for that, I was wondering where the details were! Adding this to my pile...


Is set theory more useful than category theory for dynamic types? Seems like set theory is less constrained, like a set can be part of itself by nature, whereas category theory generally doesn't allow mixing containers and elements like that out of the box IIUC. Seems like category theory best suits Haskell and such where types are more precisely defined (something that could be an int or a bool needs an ADT wrapper), whereas set theory might make sense for completely dynamic types, where something could be an int or a bool or whatever else by nature.

But I'm entirely unsure. Maybe it is completely unrelated to why set theory is used as a basis here. Just wondering.


https://www.irif.fr/~gc/papers/set-theoretic-types-2022.pdf

This research paper by Giuseppe Castagna presents an in-depth exploration of programming with set-theoretic types, which include union, intersection, and negation type connectives. The author argues that these types are not just useful, but necessary for typing some common programming patterns, and they play a crucial role in precisely typing various language constructs, from branching and pattern matching to function overloading and type-cases.

What sets this work apart is the extension of the theory of types known as semantic subtyping to include polymorphic types. This is a significant step forward as it allows for a more expressive and precise type system. The paper also discusses the design of languages that use these types and presents a theoretical framework that covers all the examples given in the presentation.

one of the key takeaways from this paper is that current programming languages are unable to infer intersection types for functions without explicit annotations. This is a limitation that could impact the development and efficiency of certain programs. The author presents three effective restrictions of this system, each with its own trade-offs, which could potentially guide the design of future programming languages.

The paper concludes with an overview of other aspects of these languages, such as pattern matching, gradual typing, and denotational semantics. These insights could have far-reaching implications for the development of more expressive and precise programming languages.

this paper pushes the boundaries of what we understand about type systems in programming languages, Elixir is going to be the first general purpose language to implement such a type system.


A set being a member if itself (or not) is the source of the famous paradox that destroyed the naive set theory %)

In practice, you usually need either a definite sum.type (int | str | null), or the all-encompassing type, like Any. I think both cases should be adequately representable in a categorial language — am I wrong?

You may need to represent a sum type that includes Any (or a similar type) if you.allow type-based signature overloads for functions. I wonder if this may represent any obstacle to a caregorial approach.



I thought the principle of the excluded middle was the ultimate source of that paradox, not set theory itself?


No, Russel's paradox and the Law of the Excluded Middle have not much to do with each other. (It is not easy to prove such a negative statement, but I googled them together, and found absolutely nothing.)


Both the Russell's paradox and the Curry's paradox (yes, by Haskell Curry) are built on referencing a rule from itself, on breaking the contextual nesting. In a sense, it's like using `goto` to jump inside a loop, thus breaking the loop invariant.

Practically all other paradoxes can be reduced to one of these two.


> A set has the least structure. It's a discrete category, with no morphisms other than identities. Conversely, the category of sets has the most structure, since its morphisms don't have to preserve any structure.

https://twitter.com/BartoszMilewski/status/16743572724981104...


This is incorrect. The number of morphisms is not what makes mathematical objects more or less structured. What is correct is that the category of sets does not have constraints for morphisms between its objects. Bartosz is equating morphisms and "structure" which is not how most mathematicians think of what it means for a category or an object to be structured.


It seems incorrect to state that his perspective is categorically incorrect. The term “structure” lacks a precise definition and his interpretation isn’t completely invalid.


It is correct, in the best possible way.

Having least structure is a soft/aesthetical statement, and the explanation is right after it (a correct definition).

I wish all the math looked like this, first a soft/aesthetical statement, then right after the correct math.


He prefaces the statement with "my intuition tells me", so perhaps he is much less categorical (pun intended) about the claim than you interpret it.



"Set-theoretic type theory" is working with the lattice of all subsets of possible values, where a "type" is one of these subsets. This lattice has plenty of structure, but sure individual subsets do not.

The category of sets is different from this lattice, since it allows arbitrary functions between sets for its morphisms rather than just inclusions.

"Set-theoretic types" have a meaningful notion of overlap. Usually types in other type system tend to be practically disjoint, like objects in a concrete category might be sets but the category itself doesn't give language to check whether the objects are disjoint sets.


Interesting definition of most structure


> a set can be part of itself

A set cannot be part of itself in an axiomatic formulation of set theory. Naïve set theory, one that allows an unrestricted general comprehension, has been thoroughly rejected.


> A set cannot be part of itself in an axiomatic formulation of set theory.

This isn't quite true. In practice, you're correct: "set theory", generally referring to ZF (or some closely related derivative thereof) with the Von Neumann Universe, doesn't allow sets to contain themselves. But it is possible to axiomatize set theory where sets can contain themselves [0].

This replaces the axiom of foundation with the axiom of anti-foundation [1], so it's not naive set theory but it is an axiomatized non-well-founded set theory.

[0]: https://plato.stanford.edu/entries/nonwellfounded-set-theory...

[1]: https://en.wikipedia.org/wiki/Aczel%27s_anti-foundation_axio...


My understanding was that you just had to get rid of the law of the excluded middle. Then things like Russell's Paradox can be stated, but evaluate to neither true nor false, and be fine. Is that not correct?


It's more to do with unrestricted circular self-reference than excluded middle but what you're saying makes sense. Without excluded middle there is no contradiction. There is something called Curry's paradox which is similar. Excluded middle is not involved in Curry's paradox but circular self-refernece is still the culprit for the paradoxical conclusion.


Sorry, but could anyone share some background (reading, links, etc.) on set vs. category theory as applied type systems?


Mostly about proof assistants, but type systems are fundamentally related https://mathoverflow.net/questions/376839/what-makes-depende...



Very excited to hear about this. Elixir is a fantastic language but tooling lags behind some other options, a type system should help with that.


> tooling lags behind some other options.

Just out of curiosity. What other options?


I use IDEA and most typed languages I write (eg Java, Go, Python with annotations) have better support for things like automated refactoring, bug detection via static analysis, etc


Dialyzer gets you pretty far, Especially since Elixir is dynamically typed.


After spending 6 months working on an Elixir codebase, and spoiled by languages with proper IDE support, I'd say Dialyzer is nice... But doesn't get even close to what I'm used to. Sometimes the benefits of the BEAM and the OTP outweight these drawbacks, of course, but to me it sucked the joy out of programming (I'm very type oriented when writing and, more importantly, reading code).


> (I'm very type oriented when writing and, more importantly, reading code).

Me too!

Have you learned any tricks for getting up to speed on large code bases with very limited static typing?

In recent years I've mostly worked on large, complex Python systems. The consequences of (undisciplined) dynamic typing not only sapped my joy in programming, it really burned me out.

Some people are really good at living with code bases like that. I'm not one of them, but I'm still hoping to find some way to bridge the mental gap.


> it really burned me out.

For me it didn't go that far, fortunately, because I managed to jump ship on time... But it was getting _tiring_.

> Have you learned any tricks (...)?

My current apprach is: don't. These days I usually work in Rust, C# and sometimes Typescript. I'm only open to working with dynamically typed languages for short periods of time, as a side quest of the main task (e.g. for my last contract I had to read quite some C code, which somewhat resembled dynamically typed code, but I spent even more time writing Rust, so it was OK).


> My current apprach is: don't.

Eh, thanks. That's my approach as well. I was just hoping for some silver bullet I guess.


Yeah, for me it was confusing to see that many programmers seem to thrive without static typing, and in fact the 6-month Elixir project was an experiment to see if I could do the same :P (to me, Elixir is still the most appealing of dynamically typed languages)


Personally I am most affected by language features that didn’t exist before September, 2021.


Are you an AI?


Are you?


That face when you realise that it's all Actual Intelligence and it's just some of it has extra steps.


Elixir leverages Erlang/OTP and the BEAM which is basically a whole platform for applications. Probably only the Java / JVM ecosystem is on par.


The tooling in the elixir ecosystem is excelent. The only lang/ecosystem that I think does a better job is maybe Rust.


I love Elixir, I use it daily, but I have to disagree. Elixir tooling is in my opinion one of the only sore spots of the ecosystem and its quality is seriously lagging behind the vast majority of other languages. When I say "tooling", I'm talking about language servers and overall editor support.

As much as I appreciate the work the team has done, I think we have to be honest about the good and bad parts. ElixirLS has never worked quite right for me, or most other Elixir developers I know. It's often painfully slow to respond even in small project, it often gets stuck in obscure failures that require an editor reload to get it fully working again. If you run into any kind of issue with it, the boilerplate response it always: Try running `rm -rf .elixir_ls _build` which sometimes works, but usually not.

Javascript, Typescript have immaculate tooling in this regard, same goes for Java, C#, Go, even Python (on IDE side, the package management situation is of course a dumpster fire). I'd go as far as to say that any language supported by Jetbrains has better tooling than Elixir does.


https://github.com/elixir-tools

Check out next-ls aims to replace ElixirLS.

https://www.elixir-tools.dev/next-ls/

There's a podcast/interview from its creator below.

https://podcast.thinkingelixir.com/153


> It's often painfully slow to respond even in small project

I had that problem with NeoVim's Mason plugin that manages language servers, then I just cloned `elixir-ls` and made a super small script to update it daily from GitHub and recompile it. I use that in NeoVim instead and it works near-instantly. Give it a try.

I don't disagree that IDE support can look subpar but I'd also venture a guess that proper IDE support is very first-world problem. You won't find yourself working on projects with millions of lines in Elixir ever, and thus not having e.g. full-blown IDE refactoring has never been a problem for me or any other Elixir dev I know.

All that being said, literally nothing I have ever saw was able to beat Golang's and OCaml's language servers. They just work and are amazingly fast to boot. It's simply a joy coding in those languages with their LS. Rust is trailing closely behind but it's also a fact that its LS is objectively slower -- still, they seem to have made a lot of strides on that front lately and it's much better compared to even one year ago.


I use VSCode at work on our Elixir codebases so I feel your pain trying to get ElixirLS working, though when it does work it's pretty good. However, on my personal laptop I've been trying out the Zed editor which has Elixir language server capabilities out of the box (not a user installed plugin) and haven't had any issues at all.


Haven't heard of Zed before, feels pretty snappy! Will try it out some more on other projects, though I can't tell how to set it up with PyEnv packages off the bat.


> Zed editor

macOS only, closed source, and VC funded :(

I don't work on a Mac, so it's a non-starter for me, but I'd be very reluctant to switch my workflow to an early stage VC funded product because I don't like having my rug pulled.


The command line tooling is pretty good, the elixirls/ide integration is pretty poor compared to other languages I have worked with. Some trivial examples while learning live view. Using : in pattern matching brings up erlang modules, instead of the atom I used elsewhere in the file. html.heex and the ~H sigil can’t seem to support both html and elixir syntax intellsense at the same time. I also somehow got def in intellsense to not make a new function but instead autocomplete to def(). Finally, after hearing how great Ecto was I was hoping it could provide some sort of hints based on the names of tables/rows I had provided in the migrations.


Sorry, can’t agree on that one. I like to play with elixir in my spare time, love the language. There is a ton of great tooling. But I use C# and Rider at work, and the IDE experience is light-years better.


There seems to be a pattern of introducing static types into previously weakly typed or dynamically typed languages. I think this is because people have written many large projects in them, and without types, it's quite hard to enforce an API or document what to pass into it if there are no types.

The drawbacks to statically typed languages have become way less significant since most of them have "var" or an equivalent which saves on typing.


It's also not your "standard static" type system. It's a new type system that novel using Set Theoretical Types. Which is kind of interesting. because if Elixir can get this working properly, it would also probably work for other Strong Dynamically Typed languages.

Imagine if Typescript's type system if it were actually sound.

The research paper is here if anyone wants to read it. https://www.irif.fr/_media/users/gduboc/elixir-types.pdf


Elixir never really didn’t have a static type system, this is just improving it.


"var" does not "save on typing", at least not in C#. A variable declared with var is still statically typed. Its type is just inferred from context. The compiler will assign it the most specific type possible from that context. That's the important point - this occurs at compile time, not runtime.


You have perhaps misconstrued what your parent was saying.

Typing has two meanings, the one intended here wasn't about types - it was about keyboards. The var keyword and similar syntax in modern languages is less typing where "typing" means pushing the little buttons on the keyboard, and the associated squiggles appearing on a display.


Hah yeah you might be right lol - var definitely does save on keystrokes.


Sorry for the ambiguous usage:)


> most of them have "var" or an equivalent which saves on typing

Do you mean type inference? Most typed-languages have that, even Haskell/Scala.

It gets iffy on dependent type systems like Agda/Idris.


Scala type inference isn't as powerful as Haskell's primarily due to nominal subtyping. Dependent type systems are so power that they defy type inference (in general, the more expressive your types, the harder it is to infer subtyping relationships, which is the primary complexity of type inference).


Indeed, even C has caught up!

#define let __auto_type


Wow, I learn something new every day!

https://gcc.gnu.org/onlinedocs/gcc/Typeof.html

Having learned (old) C in college and taking a closer look at it lately, I was surprised how much you can make it look “modern”, what with _Generic, wchar_t, booleans and even this __auto_type.

It seems like with a couple #define macros and some discipline you can make it look and behave more ergonomic.


Is this type-system guaranteed to be sound? (Does type checking ensure no runtime type bugs?). Also, can it infer all the types if none of the types are actually annotated?

Also, what is the novelty about this new type system?


Yes. it's guaranteed to be sound. As per what makes it unique, from the paper:

    We present a gradual type system for Elixir, based on the framework of semantic subtyping [10, 19]. This framework, developed for and implemented by the CDuce programming language [3, 15], provides a type system centered on the use of settheoretic types (unions, intersections, negations) that satisfy the commutativity and distributivity properties of the corresponding set-theoretic operations [19]. The system is a polymorphic type system with local type inference, that is, functions are explicitly annotated with types that may contain type variables, but their applications do not require explicit instantiations: the system deduces the right instantiations of every type variable. It also features precise typing of pattern matching combined with type narrowing: the types of the capture variables of the pattern and of some variables of the matched expression are refined in the branches to take into account the results of pattern matching
CDuce programming language: https://www.cduce.org


I had to double-check that, knowing CDuce as an XML-oriented language from when XML was hip around almost twenty years ago - and indeed it's that language for manipulating XML. It's been a while but I thought the point was to experiment with static typing of XML transformations, but ultimately type inference complexity results hit a wall and weren't convincing from a practical PoV. But what do I know - after all, adding types on top of a language that isn't statically typed seems like a pointless exercise to me when there are plenty statically typed languages. I guess it might make sense for a Python code base that's gotten out of control maybe?


It doesn't make sense for certain kinds of languages. But Elixir and Erlang unique in a way that languages like Python, Ruby, etc. are not.

A semi-mainstream BEAM language with static typing would be a huge boon. For those of us who have problems that BEAM fits really well, it's like saying Typescript doesn't matter because other statically typed languages exist. Which is really missing the point.


> It's been a while but I thought the point was to experiment with static typing of XML transformations, but ultimately type inference complexity results hit a wall

Type inference for set theoretic type systems is expensive and that has not changed. It is not surprising either: inference is harder or undecidable in more expressive type systems.

We address this by providing reconstruction based only on patterns, guards, and return types. Other than that, inference is not a major limitation to us, given you don’t need to declare types today anyway.


Nothing can infer _all_ types. Or even much of a subset. I was surprised to learn that Hindley Milner can't infer "id" as a function argument, and attempting to extend it to do so results in undecidability: https://stackoverflow.com/a/74869648/171121


how does the type system handle `receive` in erlang/elixir? does it just require every `receive` clause have a catchall case (meaning every `receive` is any -> T) or does it only guarantee type checking in the absence of external messages?


When you receive, we will use the pattern and guards from the "receive" to lift type information, but we won't verify the messages between processes in our initial efforts. Personally, as future work, I am more interested in typing behaviours, because that's how most users define and interact with processes.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: