Hacker News new | past | comments | ask | show | jobs | submit login
Sum Types Are Coming (2015) (chadaustin.me)
113 points by signa11 on Feb 16, 2018 | hide | past | favorite | 113 comments



Once you get the hang of algebraic data types (ADTs), definitely check out generalized ADTs (GADTs). They reduce the amount of code required to express a solution, make the type system do a ton of work for you to prevent bugs from compiling, and make it easier to embed domain-specific languages in your code. A lot of the GADT tutorials out there are a little abstract for my taste and I wish there were more examples with tangible applications, but this one is decent:

http://joeyh.name/blog/entry/making_propellor_safer_with_GAD...


If you want to have fun, I also have a small tutorial on how to make a typesafe printf function with GADTs (and that technique is actually used in the OCaml stdlib!)

https://drup.github.io/2016/08/02/difflists/

I also have some slides (more entertaining and probably better introduction) here:

https://www.irif.fr/~gradanne/papers/gadts.pdf


Why do they reduce the amount of code?

They add the ability for constructors to carry constraints, which aids expressivity in some cases.

But the majority of GADT uses purely add safety. Can use ordinary ADT without a type-variable that is correlated with the data constructor, and crash at runtime if the constructors don't match.


I thought GADTs would be useless to me when I first heard the explanation of GADTs as imposing additional constraints on ADTs. Whilst technically true, it feels like it's missing the point.

The real power of GADTs comes from the fact that you can use the type at compile time. This means that in OCaml I can do something like:

  type 'a t =
  | Int : int t
  | Float : float t

  let print : type a. a t -> (a -> string) =
    function
    | Int -> string_of_int
    | Float -> string_of_float

I can the do (print Float) and receive string_of_float. I can do (print Int) and receive string_of_int. Both have drastically different types, but due to the constraints that the GADT imposes it is fully type safe and the type is known at compile time.

Although this is a slightly contrived and useless example, there are scenarios where this can prove immensely useful that are not that far away and really aren't so useful when done with ADTs. Think serializing -- you would have to convert to the ADT first which isn't so useful. If you're interested in something useful, take a look at [0] (shameless plug), a very early alpha version of a similar idea allowing conversion from and to JSON.

EDIT: this is not inlined as I recall. False memory. The reason it isn't is probably because you can use existential types in which case the type would not be known at compile time.

[0] https://github.com/dpwm/typespec


> Why do they reduce the amount of code? [...] crash at runtime if the constructors don't match.

You answered yourself: If you cannot have these runtime type mismatches, you don't need to write the corresponding bookkeeping and checking code.


You don't have to write it - you just have to avoid -Wall :-)



The last couple of years I've been doing a lot of exploring different languages and language features. Sum types are one of the few features I've come across that seem to have the potential for large, practical benefit. Not only do you get the potential to eliminate null from a language with Option or Maybe types, but many cases where you might use multiple classes and files to express some hierarchy, can be replaced with just a few lines of code that is so easy to read you can even show it directly to the non programmers working on the project, to work through the definition together.

This especially happens nicely when the syntax is built into the language in a succinct way as in F# / OCaml (possibly others as well!)


It's also a convenient way to implement "value-based" state machines (where the entire machine is a single type with a value for each state, as opposed to type-based where each state is implemented as a different — and possibly completely incompatible — static state).


+1. They are a huge help for managing things like connection state. You can create powerful, statically-enforced invariants, like that a lifetime of a socket is 1:1 with a state of your state machine.


Note that TypeScript and Flow have implemented sum types since this post was published.


The lack of a match construct or implicit tagging to go along with them makes it really weird though.


No, they haven't. Union types and sum types are semantically very different from each other.



Sum types are a universal construction: you take any two types, and produce yet another type. In particular, you can sum a type with itself. Disjoint unions give you nothing of the sort.

In other words: learn to mathematics.


The tone of the parent comment is a bit aggressive, but the content is relevant.

Just in case it's not clear, the union of sets "merges" elements that are identical whereas in the (direct) sum each element gets tagged with where it comes from.

Formally, we have a set that indexes over the sets we want to sum and then we just Cartesian product each set with it's index before unioning. In abstract nonsense we usually write

A ⊕ B = (A × {0}) ∪ (B × {1})

where ⊕ is the direct sum operator. So there direct sum of a set with itself is quite different from a plain union. For more than two sets we just index over things in the obvious way and write

⨁ᵢ Aᵢ = ∪ᵢ (Aᵢ × {i})

What parent means by universal property is that any that collection of functions {fᵢ : Aᵢ → C} can be decomposed into functions {gᵢ : Aᵢ → ⨁ᵢ Aᵢ} and a single, uniquie g: ⨁ᵢ Aᵢ → C where fᵢ = g ∘ gᵢ.

With a little thought, the universal property above seems obvious, and with a little more work, you can see that it is actually equivalent to the definition of direct sum above.

Thus we can actually take this universal property as the definition of direct sum and derive the original. The cool thing is that the universal property doesn't refer to set-specific operations---just functions/maps---so we actually have a definition for the direct sum of things more general than sets. That is, we can form the direct sum of things like monoids, groups, certain classes of differential functions, etc.

I could go on, but this road leads to madness^W category theory.


> Just in case it's not clear, the disjoint union of sets "merges" elements that are identical whereas in the (direct) sum each element gets tagged with where it comes from.

That's just a union. The disjoint union (https://en.wikipedia.org/wiki/Disjoint_union) is what you're calling the direct sum. The direct sum (https://en.wikipedia.org/wiki/Direct_sum) is an operation on algebraic structures to create new algebraic structures (e.g., one can direct-sum two groups to get a new group, and similarly rings, vector spaces, etc.), not just sets.


Whoops. That one slipped past. Thanks.

In case it's not clear, I just specialize tensor sum to the category Set; that's all. At the end I do mention how the tensor sum applies on a wider class of objects. In fact, any Cartesian category has (finite) tensor products, even if the objects aren't traditionally thought of as "algebraic structures", e.g. certain classes of topological spaces, Frolicher manifolds, etc.


A union type `A | B` in Flow is disjoint if and only if the types `A` and `B` are a prior known to be disjoint. There is no equipping `A` and `B` with fresh new tags to force them to be disjoint.


... Which isn't relevant to the clarification of mathematical terms in my comment.


It is relevant, because it means that while `|` may produce a disjoint union if its arguments are already disjoint, it is not guaranteed to produce a disjoint union in the general case. Hence, my original comment had a point.

As if that weren't enough, any talk about universal properties is utterly pointless as soon as object identities matter everywhere.


> In abstract nonsense we usually write

> A ⊕ B = (A × {0}) ∪ (B × {1})

Hardly nonsense! This makes clear the connection to sets in a way that words weren't doing for me.


> In other words: learn to mathematics

That breaks the HN guidelines by crossing into incivility. You'd stopped doing that on HN, which is great, so please don't resume.

This comment would be better if it (a) dropped the rude bit and (b) made the distinction from disjoint unions more clearly.

https://news.ycombinator.com/newsguidelines.html


> In particular, you can sum a type with itself.

The way you sum a type with itself in Haskell would be something like

  data Response = Success { result :: String }
                | Failure { reason :: String }
where each component of the sum is uniquely tagged.

From my understanding of the Flow documentation, the equivalent would be

  type Response = {| success: true, result: string |}
                | {| failure: true, reason: string |};
where the only material difference is that Flow needs to work with JavaScript, which doesn't have a built-in tagging mechanism, unlike Haskell.

So although sum types and disjoint unions are different concepts, they can easily be used to simulate each other.


What you propose breaks completely as soon as you want to define and use abstract types.


I think it should be possible. Can you give an example that can't be translated?


Union and intersection types make the following assumptions:

(0) There is a single physical universe that contains all values.

(1) The role of types is to “carve out” fragments of this universe.

(2) Say you have two values `x` and `y` of types `A` and `B`. Because it is possible to form the union type `A | B`, to which both `x` and `y` belong, the only way to make sure `x` and `y` cannot be conflated with each other is to make them physically distinct elements of the universe.

On the other hand, abstract types make the following assumptions:

(0) Each type is a “world” of its own, defined in terms of its logical structure. The values of a type are a mere consequence of this logical structure. Hence, it doesn't make sense to try to “split” or “merge” types.

(1) It doesn't make sense to argue that an int is different from a string, because it's just as nonsensical as arguing that they are equal. What they are is incommensurable.

(2) Say you have type two different abstract types `A` and `B`. Because they could share an underlying representation type `T`, you cannot rely on values of different types having physically distinct representations.

---

Suppose we had a language with both union and abstract types. The contradictions between the assumptions behind union and abstract types would lead to hilarious consequences:

    signature ERROR_CODE =
    sig
      type error_code
      
      (* specification of operations on error codes *)
    end
    
    (* we use opaque ascription, so that the internal representation
     * of error codes becomes inaccessible to external parties *)
    structure ErrorCode :> ERROR_CODE =
    struct
      type error_code = int  (* prosaic representation *)
      
      (* implementation of operations on error codes *)
    end
    
    type union = error_code | int    (* union *)
    
    (* suppose we also have flow typing so that this example works *)
    fun test (u : union) =
      if u is error_code then 
        ErrorCode.printErrMsg u
      else
        printInt u
    
    val _ = test 42
You'd expect this code to print the integer `42`, but it actually prints:

    ERR_DOUGLAS_ADAMS : The answer to the Ultimate Question of Life,
      the Universe and Everything is the number forty-two.
Wasn't the fact error codes are internally represented as ints supposed to be unknowable to external parties?


Ah, I thought you meant that my construction would behave differently from a sum type if abstract types were involved, but it appears that you are actually worried that having non-disjoint unions available breaks abstract types by revealing their implementation details.

I agree that being able to change the internals of a module without breaking external code is a good thing, but unlike other security guarantees, abstract types can break merely by having a stronger type system which is able to prove strictly more things (such as whether two types are equal). That makes implementing completely leak-free abstractions quite difficult.

It can even be hard to tell which language feature exactly causes the leak. In your example, it is not just the existence of the union type that causes the problem, but the operation "is T", which you probably assume to be of type "forall a. (a | T) -> Bool" or something. Obviously, that operation is doing all the work of looking at the underlying implementation and leaking it.

While you couldn't meaningfully type such an operation without union types, having unions in your type system does not mean that such a function will be available. Although working with unions without being able to tell the types apart would be quite inconvenient, you could nonetheless remove that ability from the language, either completely (effectively requiring all unions to be disjoint) or by having special markers to enable or disable it (similar to Haskell's type class constraints).

Flow doesn't go to such great lengths because it is still built on JavaScript objects, which don't really have a concept of privacy. If there were private symbols (i.e. unique properties which can only be accessed by the module which defines them), you could trivially implement abstract types similar to Haskell's newtype wrapper as { [some private symbol]: implementation-type }. Until such a thing is added, you'll have to limit yourself to disjoint unions to avoid abstraction leaks.


> abstract types can break merely by having a stronger type system which is able to prove strictly more things (such as whether two types are equal)

Indeed, which is why I'm not such a huge fan of overly powerful type systems.

> That makes implementing completely leak-free abstractions quite difficult.

It means the type system has to have the right amount of expressive power. Too little, and separation of concerns becomes difficult. Too much, and separation of concerns becomes difficult too!

It also means that we can't rely on types to prove everything we would like to prove about programs. This seems intuitively right, because type systems are first and foremost meant to be mechanically checkable, not to be flexible enough to express every possible software requirement.

> In your example, it is not just the existence of the union type that causes the problem, but the operation "is T", which you probably assume to be of type "forall a. (a | T) -> Bool" or something.

This is correct. But removing this `is T` operation would greatly reduce the appeal of union types in practice.

> you could trivially implement abstract types similar to Haskell's newtype wrapper as { [some private symbol]: implementation-type }.

Both Haskell's `newtype` and your proposal are poor replacements for actual abstract types.


Disjoint union in mathematics is exactly the universal construction you're calling sum types: https://en.wikipedia.org/wiki/Disjoint_union

(The Flow use of the name doesn't seem to quite match the mathematical definition.)


Confusingly, there are two different things called disjoint union in mathematics. One is what you mentioned, which is indeed the coproduct in the category of sets. But another one is simply the union of two disjoint sets, which might not be too common in programming, but is used quite a lot, say, in analysis and point-set topology. Flow's use of the term obviously refers to the latter.


The former is a (small) generalization of the latter, tacking on a tag to forcibly ensure the sets are disjoint. It's not worth thinking of them as different.

In any case, Flow doesn't require that the sides of the | are different.


Um, re-read the documentation:

> These disjoint unions are made up of any number of object types which are each tagged by a single property.

It is clearly a disjoint union in the set-theoretical, not category-theoretical sense.

Let's see their own example:

    // @flow
    type Success = { success: true, value: boolean };
    type Failed  = { success: false, error: string };
    
    type Response = Success | Failed;
    
    function handleResponse(response: Response) {
      if (response.success) {
        var value: boolean = response.value; // Works!
      } else {
        var error: string = response.error; // Works!
      }
    }
Clearly, this union is disjoint because the types `Success` and `Failed` were a priori known to be disjoint.


It doesn't require that. Flow seems to accept all sorts of variations on the following (including X | X):

   type X = { foo: string }
   type Y = { foo: string, bar: string }

   type Foo = X | Y
(Y is a subset of X in that case.)


But then it isn't disjoint.


How do they differ?


The Union of Int and Int is exactly Int; the Sum is not.

The union of two types contains all values that are in either component type. The sum of them contains all of the values of the first type, each one associated with a tag like “I came from the left side” plus all of the elements from the second type with a similar tag.

The Union of Bool and Bool has 2 values, the Sum has 4.

If we define Maybe[A] to be the process of unioning on “null” to the values of A then Maybe[Maybe[A]] = Maybe[A]. If it’s defined with a Sum then that equation doesn’t hold.


Nomenclature varies, but I'm guessing the author of comment is using 'union' as the sum analog of tuples (unnamed fields) and 'sum' as the sum analog of struct/records (ie named fields).


Sum types are one of the best things, as they empower very good `match` style conditionals.


I can attest to this. I've just done a project in F# -- my first experience using a statically typed functional language beyond messing around in a REPL -- and the expressive power of this combination is intoxicating. I may have gone overboard; I found it hard to write procedural code in F#, so I split up my procedures into steps, with a type to model each step and a sum type to model the whole procedure. Coming from C / Python, where I was used to procedural code mostly working most of the time and then failing in hilarious ways, this was a revelation. A language where I can make the compiler check my procedural logic for me! Liberating!


> I may have gone overboard; I found it hard to write procedural code in F#, so I split up my procedures into steps, with a type to model each step and a sum type to model the whole procedure.

State machines are a great thing, and modelling a state machine as a sum type is reasonably common (it's not the best modelling for all state machines, but it's very good for many).


I'm glad to see I'm not totally crazy for adopting that pattern! As I'm new to the language, I'm not sure whether I'm holding it right.


Does f# have a good repl?

What ml style language has the closest thing approaching a lisp style repl?


Not saying it's the closest to what you have in mind, but you could look at the Emacs Tuareg mode for OCaml. Here is a reference card: http://www.ocamlpro.com/files/tuareg-mode.pdf

If you are a SLIME fan, you might notice that some of the key bindings to interact with the toplevel (the OCaml word for "REPL") are identical in Tuareg.

That said, although this will sound tongue-in-cheek: Having used Isabelle and Coq with their interactive editors, I find their approach better than a REPL. In these systems, you execute code directly from your source file, as opposed to (conceptually) sending snippets of code to a REPL. There is no concept of code you have written in the source (or the REPL) which is not synced up with the state of the REPL (or the source code, respectively). I wish other languages had modes like this. There seems to be one for Prolog, but I've never used it and don't know how well it works in practice: https://www.metalevel.at/ediprolog/


I've been happy with it. It's a good place to isolate your type issues and figure out how to plug things together. It helped me learn a lot about how F# does its type inference.


Hopefully flow typing will be next. It's a really cool idea and I'm surprised it hasn't caught on more.

https://en.wikipedia.org/wiki/Flow-sensitive_typing


One of my favourite features of Flow is the ‘*’ type and having Flow handle the given type based on control flow via “if” checks. You can achieve it somewhat in TypeScript too, IIRC. Neither are quite as powerful as I would like, but then what I would like is basically magic!


Oooh, that concept has a name! This has been on the list of features of my dream programming language for a long time, but I didn't know this has a name and Wikipedia article and everything.


Are they really coming? When I was a student, I have encountered them the first time in Ada83 (discriminant records) in 1990, then in gofer (a variant of Haskell developed at Oxford) in 1992, then in Caml light (the ancestor of OCaml) the same year.


I think maybe "coming to the mainstream" fits better. I too saw it in Ada in school, and of course Haskell and friends have had them for ages.

There does seem to be recent trend of popular, highly visible languages that are adding the feature though. I think the biggest thing though is really that the pattern matching syntax that makes them so easy to use is becoming more common.

You can have Sum types without pattern matching, and vice versa, but it seems to be much more common now to expect any new language to include pattern matching.


One thing I really need is getting at "the type field". This is useful information by itself, and I want it most of the time - but I think in Haskell it's not possible to get at it without manually defining a "type type" and a function that computes that type from a given sum-typed data.

I don't think that lambdas/closures have such a problem in most programming languages. It's still unclear how to handle function pointers for lambdas/closures, but it should be a lot easier to make a lambda into a real function than changing a datatype definition when it's already used across the codebase.

So, I think sum types are really another one of those ideas which may prevent some bugs sometimes, but in the end are not generally super useful, and are just a little too clever and end up causing a more work than they save. At least unless there's a well-defined representation from which we can extract the type information, in which case I'd probably be happy to use them.


"but I think in Haskell it's not possible to get at it without manually defining a "type type" and a function that computes that type from a given sum-typed data."

You can do it with reflection, which somewhat confusingly Haskell calls "generics": https://wiki.haskell.org/GHC.Generics It's very old terminology, you'll have to forgive them. If you carefully trace the etymology back to the 1990s-ish when Haskell established that term, and trace the academic ideas of "generics" back that far, you can actually find the similarities with the old academic ideas of generics and what we today call reflection, but nowadays they are so separated it's really just of historical interest.

That said, very little code uses it, because generally in Haskell whenever you would want to do "OK, look at the type of a thing, then decide what to do based on the type, then do the thing" that almost always can be collapsed down to "Pattern match on the thing, then do the thing you wanted to do". You write what it is you want to do based on types into the pattern matching itself, collapsing the middle step where you have an explicit value representing the type into implicit knowledge contained in the branch of the pattern match that occurs.

I do understand why you are thinking that way and am not trying to say it's a bad thing in general. While it's not the first thing I reach for in my Go code, I do have some very reflection-heavy Go code that works that way. But in Haskell, there's almost (but not quite) always a better solution for Haskell. If you could program the way you are discussing in Haskell, you'd find the type checker would be an even bigger jerk to you than it can be when you program the "right" way; I think you'd have a hard time making that code work with GADTs, and GADTs are very, very common in Haskell.


Could you give an example in a (possibly hypothetical) programming language of your choice of what you mean by "the type field", and what you would do with it? Maybe we could point you to something that does what you want.

Though I do wonder if reflection on type information (at runtime?) would be useful and easy for you if you find basic algebraic data types "a little too clever" and "causing more work than they save".


Regarding the type field, look for "enum EventType type;" in the article. (In practice I would not use "enum EventType", but simply "int" - but anyway).

What I would do with it? Like the result of (x+y), or any type of data, I can use it for whatever I need. I might simply want to count the number of occurences of each "type". Often I'm simply not interested in the data besides the "type", and in these cases I find that case-matching is a lot of boilerplate. And it doesn't reify the "type" to authoritative runtime values. Which, as you say, might be done with reification support from the language, but I don't want to bother with that -- and it would not allow me to choose the possible values of the type field, which might be a requirement if the options are defined somewhere else.


> Regarding the type field, look for "enum EventType type;" in the article.

OK. If I understand correctly, you want to use that field in other contexts than doing a switch on it, whereas in Haskell the usual way of consuming the type tag is to do such a switch, as in the article's example:

    case event of
        ClickEvent x y -> <do click stuff>
        PaintEvent color -> <do paint stuff>
        // there are no other possible cases, so no default case needed
> For example, I could separate a large array of sum-type data into separate arrays, and also process them separately.

Sure, but you can do that with a case expression as above. Here is a complete example:

    data Event = ClickEvent Int Int | PaintEvent Int

    -- separate a list of Events into two lists, the first holding the payloads
    -- of the click events, the other the payloads of the paint events
    separate :: [Event] -> ([(Int, Int)], [Int])
    separate [] = ([], [])
    separate (event : events) =
        case event of
            ClickEvent x y -> ((x, y) : clicks, colors)
            PaintEvent color -> (clicks, color : colors)
        where
            (clicks, colors) = separate events

    example = [ClickEvent 1 2, PaintEvent 42, ClickEvent 3 4, PaintEvent 23]
Running it:

    > separate example
    ([(1,2),(3,4)],[42,23])
You can now go ahead and process the list [(1,2),(3,4)] separately from [42,23].

(Depending on the exact specification and performance requirements, the code might be written a bit differently in the real world.)

EDIT: Ah, you edited your post while I was writing. Oh well. Regarding reifying constructors as integers or whatever, yes, you might have to write a two-line function to do that.


Yep, sorry for the edit. I did the arrays example but then noticed it was not really an answer to your question, since it can be done with sum types in almost the same way.


Hi everyone, similar to the link, I wrote a tech memo about an algebraic data type to share what I have learned while studying the data type.

https://kstreee.github.io/techmemo/algebraic.pdf


For Python (as a library, not a core language construct):

https://github.com/radix/sumtypes/

(Same year - 2015 - as the original post ;)


If you're looking for a simple sum type without associated value, the standard library's enum works quite well: https://docs.python.org/3/library/enum.html


If you're using mypy, there's also the Union type https://mypy.readthedocs.io/en/latest/kinds_of_types.html#un...


Since this article was written c++17 has added std::variant and things like std:any. But his complaint about c++ matching being verbose is really AI complaint that c++ typically doesn’t have a first class type system (unless rtti is used, and even then you can’t create new types at runtime). This lack is either a limitation or benefit depending on your objectives for the program you are developing.

There isn’t a One True Approach; I do all my work in Common Lisp or C++ and although both are multi-paradigm languages they aren’t substitutable for each other.


What is the difference between these 'sum types' and Swifts enum type?

Also I think the name 'sum type' is bad. Nothing gets added or summed up, just combined and selectively chosen. For this reason I find supercharging 'enum', like Swift does, a more logical solution.


Swift enums are sum types.

Various languages use different keywords to access the same concept:

- Rust: 'enum'

- Swift: 'enum'

- Haskell: 'data'

- F#: 'type'

- Ocaml: 'type'

> Also I think the name 'sum type' is bad. Nothing gets added or summed up, just combined and selectively chosen. For this reason I find supercharging 'enum', like Swift does, a more logical solution.

They're the same solution...

In any case, the name sum comes from viewing types "algebraically" as the article discusses. Doing addition of types with a sum type has a lot of similar properties to doing addition of numbers, and similarly doing multiplication of types with a product type (tuple, struct or class, in Swift) has a lot of similar properties to doing multiplication of numbers.


in case it's useful for anyone who has ever dealt with tagged pointers,

- C: 'union'

If you've ever dealt with an AST, a sum type is really a wonderful way to represent the nodes. In C you probably want some sort of 'tag' to indicate the actual type of the union. Other languages handle that bookeeping for you.

for java i'm tempted to say 'interface' but that doesn't quite have the flavor of what you're driving at. definitely in the same spirit though.


C++: union or boost::variant or (C++11) std::variant


And Scala (via Dotty) is being re-worked to have sum types with a cleaner "enum" syntax soon too: https://github.com/lampepfl/dotty/issues/1970 (merged)


why would you want to view a programming language algebraically?


It lets you reuse your existing knowledge about algebra to transform programs. For example if you have a data type that has two different cases (= sum) each of which has a bunch of fields (= factors in a product) some of which are shared (= common factors in a sum of products) then you can literally factor them out, just like you can factor A * B * C + A * D * C into A * (B + D) * C.


Conjunction and disjunction are more apt anologies that have the same properties. There is also no division types while subtraction is used very rarely.


I didn't think to call it that at the time, but I think that I came across a division type in react-redux the other day: https://github.com/DefinitelyTyped/DefinitelyTyped/blob/8f8f...

Playing with that a little, if adding a property to an interface is a product, eg:

    interface Foo {
        foo: string;
        bar: number;
    }
where Foo is a product of string and number, then removing a property from an interface is division:

    type Diff<T extends string, U extends string> = ({ [P in T]: P } & { [P in U]: never } & { [x: string]: never })[T];
    type Omit<T, K extends keyof T> = Pick<T, Diff<keyof T, K>>;

    interface Foo {
        foo: string;
        bar: number;
    }

    interface Bar {
        bar: number;
    }

    type Out = Omit<Foo, keyof Bar>;
the output type Out is equivalent to:

    interface Out {
        foo: string;
    }
or phrasing it another way:

    Out = Foo / Bar = (string * number) / number = string
I can't think what a 1/number type would be used for other than to remove number from a a T*number, in other words I can't think what a rational type would be used for unless it simplified down to a "normal" type. But I wouldn't like to bet that there's no other use :-)


What is the set theory intuition of that? If product is intersection, division is...?


Product is not intersection. The equivalent to product types in set theory is the Cartesian product ( https://en.wikipedia.org/wiki/Cartesian_product ).

The closest thing to a division would be a quotient set ( https://en.wikipedia.org/wiki/Quotient_set ), but there you're dividing by an equivalence relation. It is however possible to define an equivalence that undoes set multiplication: (A × B)/~ ≅ A if (a₁, b₁) ~ (a₂, b₂) holds iff a₁ = a₂, ignoring the other component.


That isn’t how product (intersection) types work in typescript. If we are talking about typescript, of course.


Product and intersection are different things in TypeScript, as well. The product of string and number would be a type like { first: string; second: number }, which combines two different values into one; whereas the intersection string & number is the type of all values that are both strings and numbers.


Right, but if we are going to call union types as sum types, why aren’t interesection types called as product types? Anyways, this is why the whole sum type thing breaks down and Union is more apt, since we can describe A|B and A&B using the same terminology family.


A | B and A + B are only "the same" (but not identical) if A and B are disjoint (i.e. there is no value that is in both types). That's why + is also called disjoint union. You can simulate + with | by introducing artificial tags to make A and B disjoint, but in the end they are different operations. TypeScript doesn't really have first-class support for sum types because it needs to remain interoperable with JavaScript, so this simulation is the closest you are going to get.


I agree, so it really isn’t an example of the popularity of sum typing. Typescript does have support for user-supplied tagging, so you can also approximate it to some extent.


Yea, support isn't first class. When I was learning to use union types in TypeScript, initially I wrote custom type guards to distinguish them, using any property that was unique to a particular constituent type to differentiate them.

Nowadays I consider that a mistake, and all unions have a discriminator property, and there is no need for custom type guards (exceptions would be when writing .d.ts files for JS that uses them untagged, that sort of thing) since it makes the code much more explicit.


> Conjunction and disjunction are more apt anologies that have the same properties.

Conjunction and disjunction also have other properties, like idempotence: A /\ A = A, which by analogy would suggest that a tuple of two integers is the same as a single integer. Similarly, A \/ A = A, which is exactly the problem mentioned by the featured article that is the difference between union types and proper sum types (aka tagged union types).

So while sum and product may not be great analogies, conjunction and disjunction seem worse.


A lot of academic programming language research is an outgrowth of mathematics, so it uses the language of mathematics.

E.G "boolean" dates back to George Boole, The Mathematical Analysis of Logic (1847)

https://en.wikipedia.org/wiki/Boolean_algebra#History

and "lambda" to Alonzo Church in the 1930s

https://en.wikipedia.org/wiki/Lambda_calculus



This would be more useful and more upvoteable if you had provided some text for context.

TLDR: If you interpret sum types as + and product types (tuples/structs/objects) as *, type definitions can be read as polynomials. Recursive type definitions, like for linked lists, can be read as recurrence equations. If you do some calculus over these things and compute the derivative of these type formulas, you get back something meaningful: The formula corresponding to the type's Zipper (https://en.wikipedia.org/wiki/Zipper_(data_structure)), a data structure for efficiently traversing and modifying functional data structures.

I like Chris Taylor's series of posts on this topic, as well as this shorter one: https://codewords.recurse.com/issues/three/algebra-and-calcu...


This is what leads to things like monads appearing as programming constructs; they provide neat abstractions which allow you to reason about code at a higher level than you otherwise could.


The numbers of values in the types are what get summed up. The sum type `Boolean | Byte` has 2 + 256 unique values while the product type `(Boolean, Byte)` has 2 * 256 unique values


"Sum" is a word used throughout maths for "coproduct". They could have used "coproduct type", but then people might have started complaining that they were being exposed to category theory ;)


> Also I think the name 'sum type' is bad. Nothing gets added or summed up

If you view types as sets of values, then a sum type is the addition (union) of two sets, while a product type is the product (combination) of two sets.

> For this reason I find supercharging 'enum', like Swift does, a more logical solution.

Both are logical, they just come from different trains of thought.

Extending "enum" comes from "developers" viewing sum types as extensions of C-style enums, sum types comes from computer science and set theory.


Sum type comes from the fact that the resulting type has as many possible values as the types it is combining, summed. This is in contrast to product types like tuples and structs.

    Either<u8, bool>
This has exactly 256 + 2 possible values.

    Tuple<u8, bool>
This has exactly 256 * 2 possible values.

Of course, this simple numerical interpretation breaks down as soon as you have something like a linked list which has infinite possible values, but even then, you can fuzz the rules a bit and get useful information out of the math. But that is the core intuition.


https://www.youtube.com/watch?v=ojZbFIQSdl8&feature=youtu.be... is a nice explanation for curly-brace programmers of why it's called a sum type (vs structs, which are product types)


The name has its origin in type theory (you also have product types). It's not the best name I guess, but not the worst either: If we have type A with n terms and type B with m terms, then the sum type of A and B has n + m terms.

There are much more confusing things in type theory, like the ambiguity that the word 'type' has (I still don't grok the terminology, to be honest).


> Also I think the name 'sum type' is bad.

The intuition is that for each types A and B with number of elemnts |A| and |B| the sum type A+B has number of elements |A|+|B|.


A sum type is the more general name for a Swift/Rust enum.


the category theory term is co-product

(which honors the convention of expressing a dual using "co-" and indeed co-product is the dual of product)


Of course there's nothing new under the sun, this is one of the features of Algol68 (1968) that Wirth discarded when he made Pascal

They're not coming, they've been and gone, probably a couple of times


All I know about Algol 68 is what I just gleaned from Wikipedia, but there I got the impression that what it had was a limited form. Namely, the type indexed things mentioned in the featured article as present in C++ and D, where you don't have tags and thus cannot usefully distinguish different variants with the same underlying type.


nope, A68 unions had hidden tags (strictly speaking the spec didn't say they had tags but it was the only obvious implementation) it did have a case statement to switch on the tags and extract the values as described above


I prever the c++ name variant, super usefull and new addition to c++17


Also, std::optional, which is analogous to Maybe.

Of course, both have been in boost for some time now. C++ so far has been doing this at the library level. But it remains to be seen if they will make their way into actual interfaces, for example in standard algorithms.


sum types are are not all that different from subtyping+inheritance, it's simply the other side of the expression problem.


Unlike subtyping, sum types are bounded - you can't add new variants without modifying the originally-defined type.

That's a huge difference in terms of code architecture - it enables things like exhaustive pattern matching, and allows the programmer to enforce stronger invariants in the code.


Subtyping can be closed, too (see Scala's sealed classes, for example) and algebraic data types can be open (see OCaml's extensible variants [1] or polymorphic variants [2]).

This is simply a distinction between closed and open sum types.

[1] https://caml.inria.fr/pub/docs/manual-ocaml/extn.html#s%3Aex...

[2] http://caml.inria.fr/pub/docs/manual-ocaml-400/manual006.htm...


Actually, they're the "other side" of products types, which are sometimes called "records" or "structs". Maybe they're occasionally used to solve the same problem, but inheritance is unrelated. Inheritance is a variety of ad-hoc methods for sharing code with quite different semantics in every language where it's implemented.


Other side?


http://wiki.c2.com/?ExpressionProblem

> The "expression problem" is a phrase used to describe a dual problem that neither ObjectOrientedProgramming nor FunctionalProgramming fully addresses.

"dual problem", meaning you choose ObjectOrientedProgramming or FunctionalProgramming, which neither fully address.

neither "side" is able to resolve the ExpressionProblem


Interesting, though I'm not sure I like the notion that it's a two sided coin. I've also never been a fan of the OO vs FP argument.


Do we really need a new name for this? It's basically "safe C-style unions", but pretty much everything in C is unsafe, so you might as well just call it "unions".


This is not a new name... in the theory of algebraic data types there is a model of what it means to add and multiply types, resulting in "sum" and "product" types. I am not 100% sure but I believe these terms have been used since the early 70s.

The term "union" as used in C is highly related but not quite the same. If you add a tag to the union (and so have a "tagged union", often represented in C as a struct with an enum and a union) then you are equivalent in an information sense (but the usage is still different, as a C union says something very specific about how memory is used).


as mentioned elsewhere 'union' originally came from Algol68 where (eventually) they were implemented exactly as described in the article


It's not a new name, just one you're unfamiliar with.


Names are evil, abolish yourself of all naming and achieve true enlightenment!


Could you please stop posting unsubstantive comments to Hacker News?

https://news.ycombinator.com/newsguidelines.html


I wouldn't exactly call this unsubstantive... though maybe it was a bit aggressive towards the OP.


I sure hope so


same <3




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: