Sum Types Are Coming (2015)

Cieplak · on Feb 16, 2018

Once you get the hang of algebraic data types (ADTs), definitely check out generalized ADTs (GADTs). They reduce the amount of code required to express a solution, make the type system do a ton of work for you to prevent bugs from compiling, and make it easier to embed domain-specific languages in your code. A lot of the GADT tutorials out there are a little abstract for my taste and I wish there were more examples with tangible applications, but this one is decent:

http://joeyh.name/blog/entry/making_propellor_safer_with_GAD...

Drup · on Feb 16, 2018

If you want to have fun, I also have a small tutorial on how to make a typesafe printf function with GADTs (and that technique is actually used in the OCaml stdlib!)

https://drup.github.io/2016/08/02/difflists/

I also have some slides (more entertaining and probably better introduction) here:

https://www.irif.fr/~gradanne/papers/gadts.pdf

Peaker · on Feb 16, 2018

Why do they reduce the amount of code?

They add the ability for constructors to carry constraints, which aids expressivity in some cases.

But the majority of GADT uses purely add safety. Can use ordinary ADT without a type-variable that is correlated with the data constructor, and crash at runtime if the constructors don't match.

dpwm · on Feb 16, 2018

I thought GADTs would be useless to me when I first heard the explanation of GADTs as imposing additional constraints on ADTs. Whilst technically true, it feels like it's missing the point.

The real power of GADTs comes from the fact that you can use the type at compile time. This means that in OCaml I can do something like:

  type 'a t =
  | Int : int t
  | Float : float t

  let print : type a. a t -> (a -> string) =
    function
    | Int -> string_of_int
    | Float -> string_of_float

I can the do (print Float) and receive string_of_float. I can do (print Int) and receive string_of_int. Both have drastically different types, but due to the constraints that the GADT imposes it is fully type safe and the type is known at compile time.

Although this is a slightly contrived and useless example, there are scenarios where this can prove immensely useful that are not that far away and really aren't so useful when done with ADTs. Think serializing -- you would have to convert to the ADT first which isn't so useful. If you're interested in something useful, take a look at [0] (shameless plug), a very early alpha version of a similar idea allowing conversion from and to JSON.

EDIT: this is not inlined as I recall. False memory. The reason it isn't is probably because you can use existential types in which case the type would not be known at compile time.

[0] https://github.com/dpwm/typespec

tom_mellior · on Feb 16, 2018

> Why do they reduce the amount of code? [...] crash at runtime if the constructors don't match.

You answered yourself: If you cannot have these runtime type mismatches, you don't need to write the corresponding bookkeeping and checking code.

Peaker · on Feb 16, 2018

You don't have to write it - you just have to avoid -Wall :-)

tom_mellior · on Feb 16, 2018

I also liked this one: https://blog.janestreet.com/why-gadts-matter-for-performance...

gameswithgo · on Feb 16, 2018

The last couple of years I've been doing a lot of exploring different languages and language features. Sum types are one of the few features I've come across that seem to have the potential for large, practical benefit. Not only do you get the potential to eliminate null from a language with Option or Maybe types, but many cases where you might use multiple classes and files to express some hierarchy, can be replaced with just a few lines of code that is so easy to read you can even show it directly to the non programmers working on the project, to work through the definition together.

This especially happens nicely when the syntax is built into the language in a succinct way as in F# / OCaml (possibly others as well!)

masklinn · on Feb 16, 2018

It's also a convenient way to implement "value-based" state machines (where the entire machine is a single type with a value for each state, as opposed to type-based where each state is implemented as a different — and possibly completely incompatible — static state).

tomjakubowski · on Feb 16, 2018

+1. They are a huge help for managing things like connection state. You can create powerful, statically-enforced invariants, like that a lifetime of a socket is 1:1 with a state of your state machine.

bcherny · on Feb 16, 2018

Note that TypeScript and Flow have implemented sum types since this post was published.

seanmcdirmid · on Feb 16, 2018

The lack of a match construct or implicit tagging to go along with them makes it really weird though.

catnaroek · on Feb 16, 2018

No, they haven't. Union types and sum types are semantically very different from each other.

pianom4n · on Feb 16, 2018

Yes, Flow has.

https://flow.org/en/docs/types/unions/#toc-disjoint-unions

catnaroek · on Feb 16, 2018

Sum types are a universal construction: you take any two types, and produce yet another type. In particular, you can sum a type with itself. Disjoint unions give you nothing of the sort.

In other words: learn to mathematics.

xelxebar · on Feb 16, 2018

The tone of the parent comment is a bit aggressive, but the content is relevant.

Just in case it's not clear, the union of sets "merges" elements that are identical whereas in the (direct) sum each element gets tagged with where it comes from.

Formally, we have a set that indexes over the sets we want to sum and then we just Cartesian product each set with it's index before unioning. In abstract nonsense we usually write

A ⊕ B = (A × {0}) ∪ (B × {1})

where ⊕ is the direct sum operator. So there direct sum of a set with itself is quite different from a plain union. For more than two sets we just index over things in the obvious way and write

⨁ᵢ Aᵢ = ∪ᵢ (Aᵢ × {i})

What parent means by universal property is that any that collection of functions {fᵢ : Aᵢ → C} can be decomposed into functions {gᵢ : Aᵢ → ⨁ᵢ Aᵢ} and a single, uniquie g: ⨁ᵢ Aᵢ → C where fᵢ = g ∘ gᵢ.

With a little thought, the universal property above seems obvious, and with a little more work, you can see that it is actually equivalent to the definition of direct sum above.

Thus we can actually take this universal property as the definition of direct sum and derive the original. The cool thing is that the universal property doesn't refer to set-specific operations---just functions/maps---so we actually have a definition for the direct sum of things more general than sets. That is, we can form the direct sum of things like monoids, groups, certain classes of differential functions, etc.

I could go on, but this road leads to madness^W category theory.

dbaupp · on Feb 16, 2018

> Just in case it's not clear, the disjoint union of sets "merges" elements that are identical whereas in the (direct) sum each element gets tagged with where it comes from.

That's just a union. The disjoint union (https://en.wikipedia.org/wiki/Disjoint_union) is what you're calling the direct sum. The direct sum (https://en.wikipedia.org/wiki/Direct_sum) is an operation on algebraic structures to create new algebraic structures (e.g., one can direct-sum two groups to get a new group, and similarly rings, vector spaces, etc.), not just sets.

xelxebar · on Feb 16, 2018

Whoops. That one slipped past. Thanks.

In case it's not clear, I just specialize tensor sum to the category Set; that's all. At the end I do mention how the tensor sum applies on a wider class of objects. In fact, any Cartesian category has (finite) tensor products, even if the objects aren't traditionally thought of as "algebraic structures", e.g. certain classes of topological spaces, Frolicher manifolds, etc.

catnaroek · on Feb 17, 2018

A union type `A | B` in Flow is disjoint if and only if the types `A` and `B` are a prior known to be disjoint. There is no equipping `A` and `B` with fresh new tags to force them to be disjoint.

dbaupp · on Feb 17, 2018

... Which isn't relevant to the clarification of mathematical terms in my comment.

catnaroek · on Feb 17, 2018

It is relevant, because it means that while `|` may produce a disjoint union if its arguments are already disjoint, it is not guaranteed to produce a disjoint union in the general case. Hence, my original comment had a point.

As if that weren't enough, any talk about universal properties is utterly pointless as soon as object identities matter everywhere.

sevensor · on Feb 16, 2018

> In abstract nonsense we usually write

> A ⊕ B = (A × {0}) ∪ (B × {1})

Hardly nonsense! This makes clear the connection to sets in a way that words weren't doing for me.

dang · on Feb 16, 2018

> In other words: learn to mathematics

That breaks the HN guidelines by crossing into incivility. You'd stopped doing that on HN, which is great, so please don't resume.

This comment would be better if it (a) dropped the rude bit and (b) made the distinction from disjoint unions more clearly.

https://news.ycombinator.com/newsguidelines.html

yorwba · on Feb 16, 2018

> In particular, you can sum a type with itself.

The way you sum a type with itself in Haskell would be something like

  data Response = Success { result :: String }
                | Failure { reason :: String }

where each component of the sum is uniquely tagged.

From my understanding of the Flow documentation, the equivalent would be

  type Response = {| success: true, result: string |}
                | {| failure: true, reason: string |};

where the only material difference is that Flow needs to work with JavaScript, which doesn't have a built-in tagging mechanism, unlike Haskell.

So although sum types and disjoint unions are different concepts, they can easily be used to simulate each other.

catnaroek · on Feb 17, 2018

What you propose breaks completely as soon as you want to define and use abstract types.

yorwba · on Feb 17, 2018

I think it should be possible. Can you give an example that can't be translated?

catnaroek · on Feb 17, 2018

Union and intersection types make the following assumptions:

(0) There is a single physical universe that contains all values.

(1) The role of types is to “carve out” fragments of this universe.

(2) Say you have two values `x` and `y` of types `A` and `B`. Because it is possible to form the union type `A | B`, to which both `x` and `y` belong, the only way to make sure `x` and `y` cannot be conflated with each other is to make them physically distinct elements of the universe.

On the other hand, abstract types make the following assumptions:

(0) Each type is a “world” of its own, defined in terms of its logical structure. The values of a type are a mere consequence of this logical structure. Hence, it doesn't make sense to try to “split” or “merge” types.

(1) It doesn't make sense to argue that an int is different from a string, because it's just as nonsensical as arguing that they are equal. What they are is incommensurable.

(2) Say you have type two different abstract types `A` and `B`. Because they could share an underlying representation type `T`, you cannot rely on values of different types having physically distinct representations.

---

Suppose we had a language with both union and abstract types. The contradictions between the assumptions behind union and abstract types would lead to hilarious consequences:

    signature ERROR_CODE =
    sig
      type error_code
      
      (* specification of operations on error codes *)
    end
    
    (* we use opaque ascription, so that the internal representation
     * of error codes becomes inaccessible to external parties *)
    structure ErrorCode :> ERROR_CODE =
    struct
      type error_code = int  (* prosaic representation *)
      
      (* implementation of operations on error codes *)
    end
    
    type union = error_code | int    (* union *)
    
    (* suppose we also have flow typing so that this example works *)
    fun test (u : union) =
      if u is error_code then 
        ErrorCode.printErrMsg u
      else
        printInt u
    
    val _ = test 42

You'd expect this code to print the integer `42`, but it actually prints:

    ERR_DOUGLAS_ADAMS : The answer to the Ultimate Question of Life,
      the Universe and Everything is the number forty-two.

Wasn't the fact error codes are internally represented as ints supposed to be unknowable to external parties?

yorwba · on Feb 17, 2018

Ah, I thought you meant that my construction would behave differently from a sum type if abstract types were involved, but it appears that you are actually worried that having non-disjoint unions available breaks abstract types by revealing their implementation details.

I agree that being able to change the internals of a module without breaking external code is a good thing, but unlike other security guarantees, abstract types can break merely by having a stronger type system which is able to prove strictly more things (such as whether two types are equal). That makes implementing completely leak-free abstractions quite difficult.

It can even be hard to tell which language feature exactly causes the leak. In your example, it is not just the existence of the union type that causes the problem, but the operation "is T", which you probably assume to be of type "forall a. (a | T) -> Bool" or something. Obviously, that operation is doing all the work of looking at the underlying implementation and leaking it.

While you couldn't meaningfully type such an operation without union types, having unions in your type system does not mean that such a function will be available. Although working with unions without being able to tell the types apart would be quite inconvenient, you could nonetheless remove that ability from the language, either completely (effectively requiring all unions to be disjoint) or by having special markers to enable or disable it (similar to Haskell's type class constraints).

Flow doesn't go to such great lengths because it is still built on JavaScript objects, which don't really have a concept of privacy. If there were private symbols (i.e. unique properties which can only be accessed by the module which defines them), you could trivially implement abstract types similar to Haskell's newtype wrapper as { [some private symbol]: implementation-type }. Until such a thing is added, you'll have to limit yourself to disjoint unions to avoid abstraction leaks.

catnaroek · on Feb 17, 2018

> abstract types can break merely by having a stronger type system which is able to prove strictly more things (such as whether two types are equal)

Indeed, which is why I'm not such a huge fan of overly powerful type systems.

> That makes implementing completely leak-free abstractions quite difficult.

It means the type system has to have the right amount of expressive power. Too little, and separation of concerns becomes difficult. Too much, and separation of concerns becomes difficult too!

It also means that we can't rely on types to prove everything we would like to prove about programs. This seems intuitively right, because type systems are first and foremost meant to be mechanically checkable, not to be flexible enough to express every possible software requirement.

> In your example, it is not just the existence of the union type that causes the problem, but the operation "is T", which you probably assume to be of type "forall a. (a | T) -> Bool" or something.

This is correct. But removing this `is T` operation would greatly reduce the appeal of union types in practice.

> you could trivially implement abstract types similar to Haskell's newtype wrapper as { [some private symbol]: implementation-type }.

Both Haskell's `newtype` and your proposal are poor replacements for actual abstract types.

dbaupp · on Feb 16, 2018

Disjoint union in mathematics is exactly the universal construction you're calling sum types: https://en.wikipedia.org/wiki/Disjoint_union

(The Flow use of the name doesn't seem to quite match the mathematical definition.)

catnaroek · on Feb 16, 2018

Confusingly, there are two different things called disjoint union in mathematics. One is what you mentioned, which is indeed the coproduct in the category of sets. But another one is simply the union of two disjoint sets, which might not be too common in programming, but is used quite a lot, say, in analysis and point-set topology. Flow's use of the term obviously refers to the latter.

dbaupp · on Feb 16, 2018

The former is a (small) generalization of the latter, tacking on a tag to forcibly ensure the sets are disjoint. It's not worth thinking of them as different.

In any case, Flow doesn't require that the sides of the | are different.

catnaroek · on Feb 17, 2018

Um, re-read the documentation:

> These disjoint unions are made up of any number of object types which are each tagged by a single property.

It is clearly a disjoint union in the set-theoretical, not category-theoretical sense.

Let's see their own example:

    // @flow
    type Success = { success: true, value: boolean };
    type Failed  = { success: false, error: string };
    
    type Response = Success | Failed;
    
    function handleResponse(response: Response) {
      if (response.success) {
        var value: boolean = response.value; // Works!
      } else {
        var error: string = response.error; // Works!
      }
    }

Clearly, this union is disjoint because the types `Success` and `Failed` were a priori known to be disjoint.

dbaupp · on Feb 17, 2018

It doesn't require that. Flow seems to accept all sorts of variations on the following (including X | X):

   type X = { foo: string }
   type Y = { foo: string, bar: string }

   type Foo = X | Y

(Y is a subset of X in that case.)

catnaroek · on Feb 17, 2018

But then it isn't disjoint.

brabel · on Feb 16, 2018

How do they differ?

tel · on Feb 16, 2018

The Union of Int and Int is exactly Int; the Sum is not.

The union of two types contains all values that are in either component type. The sum of them contains all of the values of the first type, each one associated with a tag like “I came from the left side” plus all of the elements from the second type with a similar tag.

The Union of Bool and Bool has 2 values, the Sum has 4.

If we define Maybe[A] to be the process of unioning on “null” to the values of A then Maybe[Maybe[A]] = Maybe[A]. If it’s defined with a Sum then that equation doesn’t hold.

hermanhermitage · on Feb 16, 2018

Nomenclature varies, but I'm guessing the author of comment is using 'union' as the sum analog of tuples (unnamed fields) and 'sum' as the sum analog of struct/records (ie named fields).

nixpulvis · on Feb 16, 2018

Sum types are one of the best things, as they empower very good `match` style conditionals.

sevensor · on Feb 16, 2018

I can attest to this. I've just done a project in F# -- my first experience using a statically typed functional language beyond messing around in a REPL -- and the expressive power of this combination is intoxicating. I may have gone overboard; I found it hard to write procedural code in F#, so I split up my procedures into steps, with a type to model each step and a sum type to model the whole procedure. Coming from C / Python, where I was used to procedural code mostly working most of the time and then failing in hilarious ways, this was a revelation. A language where I can make the compiler check my procedural logic for me! Liberating!

masklinn · on Feb 16, 2018

> I may have gone overboard; I found it hard to write procedural code in F#, so I split up my procedures into steps, with a type to model each step and a sum type to model the whole procedure.

State machines are a great thing, and modelling a state machine as a sum type is reasonably common (it's not the best modelling for all state machines, but it's very good for many).

sevensor · on Feb 16, 2018

I'm glad to see I'm not totally crazy for adopting that pattern! As I'm new to the language, I'm not sure whether I'm holding it right.

yetanotheruser · on Feb 16, 2018

Does f# have a good repl?

What ml style language has the closest thing approaching a lisp style repl?

tom_mellior · on Feb 16, 2018

Not saying it's the closest to what you have in mind, but you could look at the Emacs Tuareg mode for OCaml. Here is a reference card: http://www.ocamlpro.com/files/tuareg-mode.pdf

If you are a SLIME fan, you might notice that some of the key bindings to interact with the toplevel (the OCaml word for "REPL") are identical in Tuareg.

That said, although this will sound tongue-in-cheek: Having used Isabelle and Coq with their interactive editors, I find their approach better than a REPL. In these systems, you execute code directly from your source file, as opposed to (conceptually) sending snippets of code to a REPL. There is no concept of code you have written in the source (or the REPL) which is not synced up with the state of the REPL (or the source code, respectively). I wish other languages had modes like this. There seems to be one for Prolog, but I've never used it and don't know how well it works in practice: https://www.metalevel.at/ediprolog/

sevensor · on Feb 16, 2018

I've been happy with it. It's a good place to isolate your type issues and figure out how to plug things together. It helped me learn a lot about how F# does its type inference.

IshKebab · on Feb 16, 2018

Hopefully flow typing will be next. It's a really cool idea and I'm surprised it hasn't caught on more.

https://en.wikipedia.org/wiki/Flow-sensitive_typing

girvo · on Feb 16, 2018

One of my favourite features of Flow is the ‘*’ type and having Flow handle the given type based on control flow via “if” checks. You can achieve it somewhat in TypeScript too, IIRC. Neither are quite as powerful as I would like, but then what I would like is basically magic!

majewsky · on Feb 16, 2018

Oooh, that concept has a name! This has been on the list of features of my dream programming language for a long time, but I didn't know this has a name and Wikipedia article and everything.

reacweb · on Feb 16, 2018

Are they really coming? When I was a student, I have encountered them the first time in Ada83 (discriminant records) in 1990, then in gofer (a variant of Haskell developed at Oxford) in 1992, then in Caml light (the ancestor of OCaml) the same year.

Crespyl · on Feb 16, 2018

I think maybe "coming to the mainstream" fits better. I too saw it in Ada in school, and of course Haskell and friends have had them for ages.

There does seem to be recent trend of popular, highly visible languages that are adding the feature though. I think the biggest thing though is really that the pattern matching syntax that makes them so easy to use is becoming more common.

You can have Sum types without pattern matching, and vice versa, but it seems to be much more common now to expect any new language to include pattern matching.

jstimpfle · on Feb 16, 2018

One thing I really need is getting at "the type field". This is useful information by itself, and I want it most of the time - but I think in Haskell it's not possible to get at it without manually defining a "type type" and a function that computes that type from a given sum-typed data.

I don't think that lambdas/closures have such a problem in most programming languages. It's still unclear how to handle function pointers for lambdas/closures, but it should be a lot easier to make a lambda into a real function than changing a datatype definition when it's already used across the codebase.

So, I think sum types are really another one of those ideas which may prevent some bugs sometimes, but in the end are not generally super useful, and are just a little too clever and end up causing a more work than they save. At least unless there's a well-defined representation from which we can extract the type information, in which case I'd probably be happy to use them.

jerf · on Feb 16, 2018

"but I think in Haskell it's not possible to get at it without manually defining a "type type" and a function that computes that type from a given sum-typed data."

You can do it with reflection, which somewhat confusingly Haskell calls "generics": https://wiki.haskell.org/GHC.Generics It's very old terminology, you'll have to forgive them. If you carefully trace the etymology back to the 1990s-ish when Haskell established that term, and trace the academic ideas of "generics" back that far, you can actually find the similarities with the old academic ideas of generics and what we today call reflection, but nowadays they are so separated it's really just of historical interest.

That said, very little code uses it, because generally in Haskell whenever you would want to do "OK, look at the type of a thing, then decide what to do based on the type, then do the thing" that almost always can be collapsed down to "Pattern match on the thing, then do the thing you wanted to do". You write what it is you want to do based on types into the pattern matching itself, collapsing the middle step where you have an explicit value representing the type into implicit knowledge contained in the branch of the pattern match that occurs.

I do understand why you are thinking that way and am not trying to say it's a bad thing in general. While it's not the first thing I reach for in my Go code, I do have some very reflection-heavy Go code that works that way. But in Haskell, there's almost (but not quite) always a better solution for Haskell. If you could program the way you are discussing in Haskell, you'd find the type checker would be an even bigger jerk to you than it can be when you program the "right" way; I think you'd have a hard time making that code work with GADTs, and GADTs are very, very common in Haskell.

tom_mellior · on Feb 16, 2018

Could you give an example in a (possibly hypothetical) programming language of your choice of what you mean by "the type field", and what you would do with it? Maybe we could point you to something that does what you want.

Though I do wonder if reflection on type information (at runtime?) would be useful and easy for you if you find basic algebraic data types "a little too clever" and "causing more work than they save".

jstimpfle · on Feb 16, 2018

Regarding the type field, look for "enum EventType type;" in the article. (In practice I would not use "enum EventType", but simply "int" - but anyway).

What I would do with it? Like the result of (x+y), or any type of data, I can use it for whatever I need. I might simply want to count the number of occurences of each "type". Often I'm simply not interested in the data besides the "type", and in these cases I find that case-matching is a lot of boilerplate. And it doesn't reify the "type" to authoritative runtime values. Which, as you say, might be done with reification support from the language, but I don't want to bother with that -- and it would not allow me to choose the possible values of the type field, which might be a requirement if the options are defined somewhere else.

tom_mellior · on Feb 16, 2018

> Regarding the type field, look for "enum EventType type;" in the article.

OK. If I understand correctly, you want to use that field in other contexts than doing a switch on it, whereas in Haskell the usual way of consuming the type tag is to do such a switch, as in the article's example:

    case event of
        ClickEvent x y -> <do click stuff>
        PaintEvent color -> <do paint stuff>
        // there are no other possible cases, so no default case needed

> For example, I could separate a large array of sum-type data into separate arrays, and also process them separately.

Sure, but you can do that with a case expression as above. Here is a complete example:

    data Event = ClickEvent Int Int | PaintEvent Int

    -- separate a list of Events into two lists, the first holding the payloads
    -- of the click events, the other the payloads of the paint events
    separate :: [Event] -> ([(Int, Int)], [Int])
    separate [] = ([], [])
    separate (event : events) =
        case event of
            ClickEvent x y -> ((x, y) : clicks, colors)
            PaintEvent color -> (clicks, color : colors)
        where
            (clicks, colors) = separate events

    example = [ClickEvent 1 2, PaintEvent 42, ClickEvent 3 4, PaintEvent 23]

Running it:

    > separate example
    ([(1,2),(3,4)],[42,23])

You can now go ahead and process the list [(1,2),(3,4)] separately from [42,23].

(Depending on the exact specification and performance requirements, the code might be written a bit differently in the real world.)

EDIT: Ah, you edited your post while I was writing. Oh well. Regarding reifying constructors as integers or whatever, yes, you might have to write a two-line function to do that.

jstimpfle · on Feb 16, 2018

Yep, sorry for the edit. I did the arrays example but then noticed it was not really an answer to your question, since it can be done with sum types in almost the same way.

s-kim · on Feb 16, 2018

Hi everyone, similar to the link, I wrote a tech memo about an algebraic data type to share what I have learned while studying the data type.

https://kstreee.github.io/techmemo/algebraic.pdf

fermigier · on Feb 16, 2018

For Python (as a library, not a core language construct):

https://github.com/radix/sumtypes/

(Same year - 2015 - as the original post ;)

fphilipe · on Feb 16, 2018

If you're looking for a simple sum type without associated value, the standard library's enum works quite well: https://docs.python.org/3/library/enum.html

philpee2 · on Feb 16, 2018

If you're using mypy, there's also the Union type https://mypy.readthedocs.io/en/latest/kinds_of_types.html#un...

gumby · on Feb 16, 2018

Since this article was written c++17 has added std::variant and things like std:any. But his complaint about c++ matching being verbose is really AI complaint that c++ typically doesn’t have a first class type system (unless rtti is used, and even then you can’t create new types at runtime). This lack is either a limitation or benefit depending on your objectives for the program you are developing.

There isn’t a One True Approach; I do all my work in Common Lisp or C++ and although both are multi-paradigm languages they aren’t substitutable for each other.

okket · on Feb 16, 2018

What is the difference between these 'sum types' and Swifts enum type?

Also I think the name 'sum type' is bad. Nothing gets added or summed up, just combined and selectively chosen. For this reason I find supercharging 'enum', like Swift does, a more logical solution.

dbaupp · on Feb 16, 2018

Swift enums are sum types.

Various languages use different keywords to access the same concept:

- Rust: 'enum'

- Swift: 'enum'

- Haskell: 'data'

- F#: 'type'

- Ocaml: 'type'

> Also I think the name 'sum type' is bad. Nothing gets added or summed up, just combined and selectively chosen. For this reason I find supercharging 'enum', like Swift does, a more logical solution.

They're the same solution...

In any case, the name sum comes from viewing types "algebraically" as the article discusses. Doing addition of types with a sum type has a lot of similar properties to doing addition of numbers, and similarly doing multiplication of types with a product type (tuple, struct or class, in Swift) has a lot of similar properties to doing multiplication of numbers.

jfoutz · on Feb 16, 2018

in case it's useful for anyone who has ever dealt with tagged pointers,

- C: 'union'

If you've ever dealt with an AST, a sum type is really a wonderful way to represent the nodes. In C you probably want some sort of 'tag' to indicate the actual type of the union. Other languages handle that bookeeping for you.

for java i'm tempted to say 'interface' but that doesn't quite have the flavor of what you're driving at. definitely in the same spirit though.

humanrebar · on Feb 16, 2018

C++: union or boost::variant or (C++11) std::variant

sixbrx · on Feb 16, 2018

And Scala (via Dotty) is being re-worked to have sum types with a cleaner "enum" syntax soon too: https://github.com/lampepfl/dotty/issues/1970 (merged)

gavinlynch · on Feb 16, 2018

why would you want to view a programming language algebraically?

yorwba · on Feb 16, 2018

It lets you reuse your existing knowledge about algebra to transform programs. For example if you have a data type that has two different cases (= sum) each of which has a bunch of fields (= factors in a product) some of which are shared (= common factors in a sum of products) then you can literally factor them out, just like you can factor A * B * C + A * D * C into A * (B + D) * C.

seanmcdirmid · on Feb 16, 2018

Conjunction and disjunction are more apt anologies that have the same properties. There is also no division types while subtraction is used very rarely.

DougBTX · on Feb 16, 2018

I didn't think to call it that at the time, but I think that I came across a division type in react-redux the other day: https://github.com/DefinitelyTyped/DefinitelyTyped/blob/8f8f...

Playing with that a little, if adding a property to an interface is a product, eg:

    interface Foo {
        foo: string;
        bar: number;
    }

where Foo is a product of string and number, then removing a property from an interface is division:

    type Diff<T extends string, U extends string> = ({ [P in T]: P } & { [P in U]: never } & { [x: string]: never })[T];
    type Omit<T, K extends keyof T> = Pick<T, Diff<keyof T, K>>;

    interface Foo {
        foo: string;
        bar: number;
    }

    interface Bar {
        bar: number;
    }

    type Out = Omit<Foo, keyof Bar>;

the output type Out is equivalent to:

    interface Out {
        foo: string;
    }

or phrasing it another way:

    Out = Foo / Bar = (string * number) / number = string

I can't think what a 1/number type would be used for other than to remove number from a a T*number, in other words I can't think what a rational type would be used for unless it simplified down to a "normal" type. But I wouldn't like to bet that there's no other use :-)

seanmcdirmid · on Feb 16, 2018

What is the set theory intuition of that? If product is intersection, division is...?

yorwba · on Feb 16, 2018

Product is not intersection. The equivalent to product types in set theory is the Cartesian product ( https://en.wikipedia.org/wiki/Cartesian_product ).

The closest thing to a division would be a quotient set ( https://en.wikipedia.org/wiki/Quotient_set ), but there you're dividing by an equivalence relation. It is however possible to define an equivalence that undoes set multiplication: (A × B)/~ ≅ A if (a₁, b₁) ~ (a₂, b₂) holds iff a₁ = a₂, ignoring the other component.

seanmcdirmid · on Feb 16, 2018

That isn’t how product (intersection) types work in typescript. If we are talking about typescript, of course.

yorwba · on Feb 16, 2018

Product and intersection are different things in TypeScript, as well. The product of string and number would be a type like { first: string; second: number }, which combines two different values into one; whereas the intersection string & number is the type of all values that are both strings and numbers.

seanmcdirmid · on Feb 16, 2018

Right, but if we are going to call union types as sum types, why aren’t interesection types called as product types? Anyways, this is why the whole sum type thing breaks down and Union is more apt, since we can describe A|B and A&B using the same terminology family.

yorwba · on Feb 16, 2018

A | B and A + B are only "the same" (but not identical) if A and B are disjoint (i.e. there is no value that is in both types). That's why + is also called disjoint union. You can simulate + with | by introducing artificial tags to make A and B disjoint, but in the end they are different operations. TypeScript doesn't really have first-class support for sum types because it needs to remain interoperable with JavaScript, so this simulation is the closest you are going to get.

seanmcdirmid · on Feb 16, 2018

I agree, so it really isn’t an example of the popularity of sum typing. Typescript does have support for user-supplied tagging, so you can also approximate it to some extent.

DougBTX · on Feb 18, 2018

Yea, support isn't first class. When I was learning to use union types in TypeScript, initially I wrote custom type guards to distinguish them, using any property that was unique to a particular constituent type to differentiate them.

Nowadays I consider that a mistake, and all unions have a discriminator property, and there is no need for custom type guards (exceptions would be when writing .d.ts files for JS that uses them untagged, that sort of thing) since it makes the code much more explicit.

tom_mellior · on Feb 16, 2018

> Conjunction and disjunction are more apt anologies that have the same properties.

Conjunction and disjunction also have other properties, like idempotence: A /\ A = A, which by analogy would suggest that a tuple of two integers is the same as a single integer. Similarly, A \/ A = A, which is exactly the problem mentioned by the featured article that is the difference between union types and proper sum types (aka tagged union types).

So while sum and product may not be great analogies, conjunction and disjunction seem worse.

SideburnsOfDoom · on Feb 16, 2018

A lot of academic programming language research is an outgrowth of mathematics, so it uses the language of mathematics.

E.G "boolean" dates back to George Boole, The Mathematical Analysis of Logic (1847)

https://en.wikipedia.org/wiki/Boolean_algebra#History

and "lambda" to Alonzo Church in the 1930s

https://en.wikipedia.org/wiki/Lambda_calculus

wz1000 · on Feb 16, 2018

https://chris-taylor.github.io/blog/2013/02/10/the-algebra-o...

tom_mellior · on Feb 16, 2018

This would be more useful and more upvoteable if you had provided some text for context.

TLDR: If you interpret sum types as + and product types (tuples/structs/objects) as *, type definitions can be read as polynomials. Recursive type definitions, like for linked lists, can be read as recurrence equations. If you do some calculus over these things and compute the derivative of these type formulas, you get back something meaningful: The formula corresponding to the type's Zipper (https://en.wikipedia.org/wiki/Zipper_(data_structure)), a data structure for efficiently traversing and modifying functional data structures.

I like Chris Taylor's series of posts on this topic, as well as this shorter one: https://codewords.recurse.com/issues/three/algebra-and-calcu...

Smaug123 · on Feb 16, 2018

This is what leads to things like monads appearing as programming constructs; they provide neat abstractions which allow you to reason about code at a higher level than you otherwise could.

matt_kantor · on Feb 16, 2018

The numbers of values in the types are what get summed up. The sum type `Boolean | Byte` has 2 + 256 unique values while the product type `(Boolean, Byte)` has 2 * 256 unique values

Smaug123 · on Feb 16, 2018

"Sum" is a word used throughout maths for "coproduct". They could have used "coproduct type", but then people might have started complaining that they were being exposed to category theory ;)

masklinn · on Feb 16, 2018

> Also I think the name 'sum type' is bad. Nothing gets added or summed up

If you view types as sets of values, then a sum type is the addition (union) of two sets, while a product type is the product (combination) of two sets.

> For this reason I find supercharging 'enum', like Swift does, a more logical solution.

Both are logical, they just come from different trains of thought.

Extending "enum" comes from "developers" viewing sum types as extensions of C-style enums, sum types comes from computer science and set theory.

kazagistar · on Feb 16, 2018

Sum type comes from the fact that the resulting type has as many possible values as the types it is combining, summed. This is in contrast to product types like tuples and structs.

    Either<u8, bool>

This has exactly 256 + 2 possible values.

    Tuple<u8, bool>

This has exactly 256 * 2 possible values.

Of course, this simple numerical interpretation breaks down as soon as you have something like a linked list which has infinite possible values, but even then, you can fuzz the rules a bit and get useful information out of the math. But that is the core intuition.

_pvxk · on Feb 16, 2018

https://www.youtube.com/watch?v=ojZbFIQSdl8&feature=youtu.be... is a nice explanation for curly-brace programmers of why it's called a sum type (vs structs, which are product types)

meuk · on Feb 16, 2018

The name has its origin in type theory (you also have product types). It's not the best name I guess, but not the worst either: If we have type A with n terms and type B with m terms, then the sum type of A and B has n + m terms.

There are much more confusing things in type theory, like the ambiguity that the word 'type' has (I still don't grok the terminology, to be honest).

reycharles · on Feb 16, 2018

> Also I think the name 'sum type' is bad.

The intuition is that for each types A and B with number of elemnts |A| and |B| the sum type A+B has number of elements |A|+|B|.

silluk · on Feb 16, 2018

A sum type is the more general name for a Swift/Rust enum.

doug1001 · on Feb 16, 2018

the category theory term is co-product

(which honors the convention of expressing a dual using "co-" and indeed co-product is the dual of product)

Taniwha · on Feb 16, 2018

Of course there's nothing new under the sun, this is one of the features of Algol68 (1968) that Wirth discarded when he made Pascal

They're not coming, they've been and gone, probably a couple of times

tom_mellior · on Feb 16, 2018

All I know about Algol 68 is what I just gleaned from Wikipedia, but there I got the impression that what it had was a limited form. Namely, the type indexed things mentioned in the featured article as present in C++ and D, where you don't have tags and thus cannot usefully distinguish different variants with the same underlying type.

Taniwha · on Feb 16, 2018

nope, A68 unions had hidden tags (strictly speaking the spec didn't say they had tags but it was the only obvious implementation) it did have a case statement to switch on the tags and extract the values as described above

ManDeJan · on Feb 16, 2018

I prever the c++ name variant, super usefull and new addition to c++17

humanrebar · on Feb 16, 2018

Also, std::optional, which is analogous to Maybe.

Of course, both have been in boost for some time now. C++ so far has been doing this at the library level. But it remains to be seen if they will make their way into actual interfaces, for example in standard algorithms.

platz · on Feb 16, 2018

sum types are are not all that different from subtyping+inheritance, it's simply the other side of the expression problem.

yen223 · on Feb 16, 2018

Unlike subtyping, sum types are bounded - you can't add new variants without modifying the originally-defined type.

That's a huge difference in terms of code architecture - it enables things like exhaustive pattern matching, and allows the programmer to enforce stronger invariants in the code.

rbehrends · on Feb 16, 2018

Subtyping can be closed, too (see Scala's sealed classes, for example) and algebraic data types can be open (see OCaml's extensible variants [1] or polymorphic variants [2]).

This is simply a distinction between closed and open sum types.

[1] https://caml.inria.fr/pub/docs/manual-ocaml/extn.html#s%3Aex...

[2] http://caml.inria.fr/pub/docs/manual-ocaml-400/manual006.htm...

l_dopa · on Feb 16, 2018

Actually, they're the "other side" of products types, which are sometimes called "records" or "structs". Maybe they're occasionally used to solve the same problem, but inheritance is unrelated. Inheritance is a variety of ad-hoc methods for sharing code with quite different semantics in every language where it's implemented.

nixpulvis · on Feb 16, 2018

Other side?

platz · on Feb 16, 2018

http://wiki.c2.com/?ExpressionProblem

> The "expression problem" is a phrase used to describe a dual problem that neither ObjectOrientedProgramming nor FunctionalProgramming fully addresses.

"dual problem", meaning you choose ObjectOrientedProgramming or FunctionalProgramming, which neither fully address.

neither "side" is able to resolve the ExpressionProblem

nixpulvis · on Feb 16, 2018

Interesting, though I'm not sure I like the notion that it's a two sided coin. I've also never been a fan of the OO vs FP argument.

taneq · on Feb 16, 2018

Do we really need a new name for this? It's basically "safe C-style unions", but pretty much everything in C is unsafe, so you might as well just call it "unions".

saurik · on Feb 16, 2018

This is not a new name... in the theory of algebraic data types there is a model of what it means to add and multiply types, resulting in "sum" and "product" types. I am not 100% sure but I believe these terms have been used since the early 70s.

The term "union" as used in C is highly related but not quite the same. If you add a tag to the union (and so have a "tagged union", often represented in C as a struct with an enum and a union) then you are equivalent in an information sense (but the usage is still different, as a C union says something very specific about how memory is used).

Taniwha · on Feb 16, 2018

as mentioned elsewhere 'union' originally came from Algol68 where (eventually) they were implemented exactly as described in the article

Fufjeudndh · on Feb 16, 2018

It's not a new name, just one you're unfamiliar with.

nixpulvis · on Feb 16, 2018

Names are evil, abolish yourself of all naming and achieve true enlightenment!

dang · on Feb 16, 2018

Could you please stop posting unsubstantive comments to Hacker News?

https://news.ycombinator.com/newsguidelines.html

nixpulvis · on Feb 16, 2018

I wouldn't exactly call this unsubstantive... though maybe it was a bit aggressive towards the OP.

Fufjeudndh · on Feb 16, 2018

I sure hope so

nixpulvis · on Feb 16, 2018

same <3