Hacker News new | past | comments | ask | show | jobs | submit login
Approximating sum types in Python with Pydantic (yossarian.net)
158 points by woodruffw 28 days ago | hide | past | favorite | 97 comments



I think it would be useful to differentiate more clearly between what is offered by Python's type system, and what is offered by Pydantic.

That is, you can approximate Rusts's enum (sum type) with pure Python using whatever combination of Literal, Enum, Union and dataclasses. For example (more here[1]):

  @dataclass
  class Foo: ...
  @dataclass
  class Bar: ...
  
  Frobulated = Foo | Bar
Pydantic adds de/ser, but if you're not doing that then you can get very far without it. (And even if you are, there are lighter-weight options that play with dataclasses like cattrs, pyserde, dataclasses-json).

[1] https://threeofwands.com/algebraic-data-types-in-python/


Yep, someone brought this up on another discussion forum. The post was intended to be explicitly about accomplishing the ser/de half as well, hence the emphasis on Pydantic :-)

(Python’s annotated types are very powerful, and you can do this and more with them if you don’t immediately need ser/de! But they also have limitations, e.g. I believe Union wasn’t allowed in isinstance checks or matching until a recent version.)


If you're looking for serialization/deserialization, you might consider confactory [1]. I created to be a factory for objects defined in configs. It actually builds the Python objects without much effort on the user. It simply makes use of type annotations (though you can define your own serializers and deserializers).

It also supports complex structures like union types, lists, etc. I used it to create cresset [2], a package that allows building Pytorch models directly from config files.

[1]: https://pypi.org/project/confactory/ [2]: https://pypi.org/project/cresset/


I think it’s quite useful to separate ser/de, structural validation, and semantic validation. This is where I struggle with a library like ruamel.yaml, running deserialization and structural validation together, or Pydantic, running structural and semantic validation together. It’s not hard to write a Python type annotation for what you get from json.loads, and it’s also not hard to write a recursive function with a 200 line match statement that reflects on type annotations to convert that to typeddicts, data classes, and so forth. But semantic validation is a whole other problem, one that tends to be so domain specific it’s better deferred. Not that you shouldn’t do it, but that it belongs in its own data processing layer. Also this lets you be specific about what’s wrong with a piece of input. Bad JSON? A list where a dictionary was expected? An end timestamp that’s before the start? Sure, check each of these, and in context make invalid state unrepresentable, but invalid state after json.loads is very different from invalid state after validating your timestamps.


Yeah and in practise most people will probably be using Pydantic anyway. Just wanted to point out it's not strictly necessary. :)


Pydantic offers runtime checks.

Also I’d add msgspec to your list at the end. Lightweight and fast, handles validation during decoding.


Good point, but that's not always desirable. If you have strict type-checking and _aren't_ doing ser/de, it's likely not necessary (eg Rust doesn't do runtime checks).


The situation we found where it's still useful is if your app supports extension-type functionality. The 3rd parties writing extensions would ideally be type-checking during development... but they might not bother. Runtime validation becomes useful at the interfaces.


Typeguard too. The @typechecked annotation on any function or method will blow up with an error at runtime if types do not match


I needed to reflect Rust enums and went a bit further with that approach. All variants are wrapped in a decorated class, where the decorator automatically computes the union type and adds de/serialization hooks for `cattrs`.

    @enumclass
    class MyEnum:
        class UnitLikeVariant(Variant0Arg): ...
    
        class TupleLikeVariant(Variant2Arg[int, str]): ...
    
        @dataclass
        class StructLikeVariant:
            foo: float
            bar: int

        # The following class variable is automatically generated:
        #
        # type = UnitLikeVariant | TupleLikeVariant | StructLikeVariant
where the `VariantXArg` classes are predefined.


Fascinating, how did you get the type hint on the `type` class variable to be correct? (Or is this not visible to mypy?)


Does MyPy properly validate the use of these types?

Do you have anything public that elaborates on this?


Everyone is offering their suggestions, but no one has posted about marshmallow which handles everything out of the box including serialization and de-serialization. It's the perfect balance of dataclasses, (de)serialization, and lack of useless features and umpteen hacks that libraries like Pydantic and FastAPI have.


But marshmallow doesn’t do any of the (compile-time) typing stuff.

If you don’t care about types and just want ser/de that’s great, but I think it’s clearly on topic here to care about types.


Last time I tried dataclasses-json it had no type safety whatsoever and relied on the data being correct without checking it.

It was also an order of magnitude slower than other libraries, and at the time all these libraries were much slower.


There is also https://github.com/zifeo/dataconf that relies heavily on dataclasses to represent configs.


The problem I see with it is this: Now, instead of understanding Python, which is straightforward, you have to understand a bunch about Pydantic and type unions. In a large shop of Python programmers, I would expect many would not follow most of this.

Essentially, if this is a feature you must have, Python seems like the wrong language. Maybe if you only need it in spots this makes sense...


I think an important piece of context here is that this is not useful for non-ser/de patterns in Python: if all you have is pure Python types that don't need to cross serialization boundaries, then you can do all of this in pure Python (and refine it with Python's very mature type annotations).

In practice, however, Pydantic is one of the most popular packages/frameworks in Python because people do in fact need this kind of complexity. In particular, it makes wrangling complicated object hierarchies that come from REST APIs much easier/error prone.


> instead of understanding Python, which is straightforward, you have to understand a bunch about Pydantic and type unions.

This like saying "instead of understanding Python, you have to understand a bunch about SQLAlchemy and ORMs" or "instead of understanding Python, you need to understand GRPC and data streaming."

Ultimately every library you add to a project is cognitive overhead. Major frameworks or tools like sqlalchemy, Flask/Django, Pandas, etc. have a lot of cognitive overhead. The engineering decision is whether that cognitive overhead is worth what the library provides.

The measurement of worth is really dependent on your use case. If your use for Python is data scientists doing iterative, interactive work in Jupyter notebooks, Pydantic is probably not worth it. If you're building a robust data pipeline or backend web app with high availability requirements but dealing with suspect data parsing, Pydantic might be worth it.


You're not wrong, but the distinction here that I was responding to was the idea of needing to use Pydantic routinely for typechecking. Libraries that you have to know might as well be language features.

The phrasing of "The engineering decision" in your reply is telling -- you are coming from it as an engineer. But I'm looking at the population of Python programmers, which extends far beyond software engineers. The more such people have to learn, the more problematic the language becomes. Python succeeded despite not being a statically compiled language with clear typechecking because there is an audience for which those aren't the critical factors.

As I said in another response, it reminds me of what happened to Java. Maybe that's just my own quirk, but none of these changes are free.


I think you are underestimating python developers. When python became popular popular languages did not have such expressive type systems. Java and perl were popular then.

Also, here it is claimed the library should be part of the language, and at the same time it us assumed it is too complicated for the users to understand. It seems like the feature being a library solves this, if we let go of the self-imposed requirement of it being part of the language.


When Python started to become popular as Perl alternative, and Zope became a thing to be aware of, I already learned about Caml Light, Objective Caml, Miranda, Standard ML, and the new kind in town Haskell.

Also, even within the constrains of C++98 type system, expressivness wasn't something C++ was lacking.


> When python became popular popular languages did not have such expressive type systems. Java and perl were popular then.

I really like some of the languages you mention, but most of them were not popular, especially in Python domain at the time (scripts and web servers).

The main contenders against Python then were Perl (timtowtdi vs Python one way) and Java.

Java and C++ were strongly typed, but lacking most of the nice things of Haskell etc (at least Java). C++ is very expressive (prob more because of templates than the type system, but I will happily concede this one).

For scripts and web backends, C++ and Haskel/ml were not popular. This leaves Java, perl, php and similar, and at the time neither had advanced type systems in the ergonomic way that it is now expected.


> I think you are underestimating python developers.

There are a lot of people out there writing Python, and a lot of them identify as analysts, (non-software) engineers, scientists, and so on. Not developers. Some of them write immaculate code. Some of them don’t know about git, functions, or commandline arguments, so their code is one long script with big chunks they comment or uncomment depending on what they’re trying to do. The latter are a big constituency for me. Plain type annotations are great in this context, because they place no burden on the user at all. All they have to do is ignore them. Best case, they notice that function f returns a list of floats rather than an ndarray and that saves me having to explain what’s going on.


Usually syntax makes things easier, certainly for types. That's why we have syntax.

I don't claim Python developers cannot understand it. But every additional thing adds to the cognitive burden.


> Libraries that you have to know might as well be language features.

What you have to know depends on where you're working and what you're doing. You don't have to know GRPC Python libraries, unless it's a company that uses GRPC for internal communication. You don't have to know Flask unless you're building a REST API using Flask. You don't have to know beautifulsoup unless you're building a web scraper. You don't have to know Pydantic unless you're working on a project that uses Pydantic for data validation.

> The phrasing of "The engineering decision" in your reply is telling -- you are coming from it as an engineer. But I'm looking at the population of Python programmers, which extends far beyond software engineers.

You don't have to be a software engineer to make an engineering decision. When a data scientist uses conda because they don't want to manage their Python environment manually, but runs into performance issues on production because their Docker containers are multiple gigabytes larger than they should be -- that's the result of an engineering decision. When a business analyst writes a Python script and manually installs packages without a requirements file, then tries to get it running on a new computer 8 months later but can't because they forget which package versions they used -- that's the result of an engineering decision. So when you deploy your code without any data validation that runs fine now, but breaks in unexpected ways next week because the result of an external REST API you're calling changed unexpectedly...

> The more such people have to learn, the more problematic the language becomes. Python succeeded despite not being a statically compiled language with clear typechecking because there is an audience for which those aren't the critical factors. ... none of these changes are free.

I agree with all this, which is why I said that the engineering decision is deciding whether or not the cost is worth it. Different projects, companies, and people will have different needs.

Your original assertion was that "if this is a feature you must have, Python seems like the wrong language" -- but this contradicts what you're saying. The overhead of learning a single Python library is far, far less than, say, introducing Rust into a company that only uses Python for everything else.


Yes, at that point in time you wouldn't switch from Python. Hence the comments about Java, as an example of where escalating complexity can take you.

I think my assertion, less pithily, was "if having the best type-checking system was critical to you, probably you wouldn't pick Python". And I think that's correct. People pick it for other features.

I didn't say I hated having the option. I expressed reservations, which I still have.


> if having the best type-checking system was critical to you, probably you wouldn't pick Python

Agree, but the only situation where as a developer you can pick a language/ecosystem on its own merits, independently of anything else, is on personal projects. Even if you're a startup CTO building a greenfield app, you have account for hiring and train developers. It's perfectly sensible that you would want to use Python + mypy/pyright/pydantic/etc for extra robustness since it's easy to find Python devs, with a relatively small learning curve if they haven't use those tools, vs going full-on Rust or Haskell, which would require much more rare + expensive people and/or a much longer training period.


> Ultimately every library you add to a project is cognitive overhead. Major frameworks or tools like sqlalchemy, Flask/Django, Pandas, etc. have a lot of cognitive overhead. The engineering decision is whether that cognitive overhead is worth what the library provides.

IMO a library that provides regular functions and values that follow the rules of the language adds zero cognitive overhead. Frameworks that change/break the rules, that let you do things that you can't normally do with regular values, or don't let you do things that you normally could do, are the ones that add overhead, and it sounds like Pydantic is more in that category.


Pydantic is truly a godsend to the Python ecosystem. It is a full implementation of "parse don't validate" and does so using Python's existing type declarations. It uses the same forms as dataclasses, SQLAlchemy, and Django that have been part of Python forever so most Python programmers are familiar with it. And the reason you reach for it is that it eliminates whole classes of errors when the boundary between your program and the outside world is only via .model_validate() and .model_dump(). The outside world including 3rd-party API calls. The data either comes back to you exactly like you expect it to, or it errs. It's hundreds of tests that you simply don't have to write.

In the same way that SQLite bills itself as the better alternative to fopen(), Pydantic is the better alternative to json.loads()/json.dumps().


I don't think you are wrong and I have at times missed having such an option. But... I saw Java go down this path of cool features that you needed to learn, on top of the basic language, and eventually it took Java to an environment where learning the toolset and environment was complex, and vastly changed the calculus of how approachable the language was. In my mind, anyway, it went from being a useful if incomplete tool to being a more complete language that was not really worth messing with unless you were going to make a big commitment.

Every step that takes Python in that direction is a mistake, because if we need to make a huge commitment, Python probably isn't the right language. A large part of the appeal of Python is that it is easy to learn, easy to bring devs up to speed on if they don't know it, easy to debug and understand. That's why people use it despite its performance shortcomings, despite its concurrency issues, etc. (That and the benefit of a large and fairly high quality library.)


I think you're right but I take a different view of it and think it's great. Python is changing so that the language you switch to when you need more performance or type safety is... Python. At some level you have to meet users where they are and large complex applications are already written in Python. And it's those kinds of developers are more invested in the future direction of the language.

I think Python's journey is very similar to Go in this regard where as the language matures and more people start using it for large applications you start having to compromise on the ease of on-boarding in favor of the users who are trying to get work done. Both Python and Go added generics around the same time.


Yes, the following is so easy in OCaml, it would be a major undertaking in Python:

  type committer =
    InnerCircle of string
  | NPC of string
  | Dissenter of string
  ;;

  type coc_reaction =
    DoNothing
  | ThreeMonthsWithoutHumiliation
  | PublicDefamation
  ;;

  let adjudicate = function
    InnerCircle _ -> DoNothing
  | NPC _ -> ThreeMonthsWithoutHumiliation
  | Dissenter _ -> PublicDefamation
  ;;

  # adjudicate (InnerCircle "Wouters");;
  - : coc_reaction = DoNothing
  # adjudicate (Dissenter "Peters");;
  - : coc_reaction = PublicDefamation
Just use another language, also for social and professional reasons.


As a matter of fact this would not be a "major undertaking" in Python, unless your definition of the term is majorly loose:

    @dataclass
    class InnerCircle:
        s: str

    @dataclass
    class NPC:
        s: str

    @dataclass
    class Dissenter:
        s: str

    Committer = Union[InnerCircle, NPC, Dissenter]

    class CocReaction(Enum):
        DoNothing = auto()
        ThreeMonthsWithoutHumiliation = auto()
        PublicDefamation = auto()

    def adjudicate(c: Committer) -> CocReaction:
        match c:
            case InnerCircle():
                return CocReaction.DoNothing
            case NPC():
                return CocReaction.ThreeMonthsWithoutHumiliation
            case Dissenter():
                return CocReaction.PublicDefamation
Although in reality you'd likely model Committer as a product of a status and name, and adjudicate as a map of status to reaction, unless there are other strong reasons to make Committer a sum.


I think it is normal to know popular libraries of a language. For python, django, drf, fastapi, pydantic or jinja are very common.

There are some people resisting type checks in python but I think fewer and fewer. I dont think people refusing to learn basic concepts and libraries are a reason to not use something.

Also, I am not a big fan of not doing something useful because we need to do a bit of learning. It seems like a variant of "we have always done it this way". Plus it is a strawman attributed to python developers, IMO.


As it just so happens, I was struggling with this in Python recently and this post describes a better solution than what I came up with.

> Essentially, if this is a feature you must have, Python seems like the wrong language.

While I don't disgree in the absolute sense, there are constraints. You can't just switch language or change the problem you're solving. If you have the need for more type safety, then this is a price worth paying.


Okay? Programmers have to understand lots of things that aren't just the bare basics of the language they're using. When did we decide that all software developers are helpless? When can we get back to expecting experts to know things?


When? Probably around the time when the people hoping they don't need to know anything because ai will write what they want discover desire without knowledge doesn't work so well?


Completely disagree. Pydantic, FastAPI, Type Hints, mypy, pyright have all made python much more enjoyable to use and less error prone.


One caveat of the tip in the "Deduplicating shared variant state" section about including an underspecified discriminator field in the base class, is that it doesn't play well if you're using Literals instead of Enums as the discriminator type. Python does not allow you to narrow a literal type of a field in a subclass, so the following doesn't type check:

  from typing import Literal
  
  class _FrobulatedBase:
      kind: Literal['foo', 'bar']
      value: str
  
  class Foo(_FrobulatedBase):
      kind: Literal['foo'] = 'foo'
      foo_specific: int
  
  class Bar(_FrobulatedBase):
      kind: Literal['bar'] = 'bar'
      bar_specific: bool


  "kind" overrides symbol of same name in class "_FrobulatedBase"
    Variable is mutable so its type is invariant
      Override type "Literal['foo']" is not the same as base type "Literal['foo', 'bar']"
https://pyright-play.net/?code=GYJw9gtgBALgngBwJYDsDmUkQWEMo...


> it doesn't play well if you're using Literals instead of Enums as the discriminator type

The original example code with Enums doesn't type-check either, and for the same reason:

If the type checker allowed that, someone could take an object of type Foo, assign it to a variable of type _FrobulatedBase, then use that variable to modify the kind field to 'bar' and now you have an illegal Foo with kind 'bar'.


mypy typechecks that just fine[1].

However, I think that's possibly a bug :-) -- I agree that narrowing a literal via subclassing is unsound. That's why the example in the blog used `str` for the superclass, not the closure of all `Literal` variants.

(I use this pattern pretty extensively in Python codebases that are typechecked with mypy, and I haven't run into many issues with mypy failing to understand the variant shapes -- the exception to this so far has been with `RootModel`, where mypy has needed Pydantic's mypy plugin[2] to understand the relationship between the "root" type and its underlying union. But it's possible that this is essentially unsound as well.)

[1]: https://mypy-play.net/?mypy=latest&python=3.12&gist=f35da62e...

[2]: https://docs.pydantic.dev/latest/integrations/mypy/


Using str in the superclass equally unsound and also doesn't type check. There's no good way to do it, as the discriminator type is by definition disjoint between all kinds.


The problem here seems to be inheritance. Why share a base class? Why not write Frobulated = Foo|Bar and go about your day writing functions like

    def frotz(x: Frobulated) -> str:
        return f”{x.value} is the value of x”


Meta comment.

Something I've wondered of late. I keep seeing these articles pop up and they're trying to recreate ADTs for Python in the manner of Rust. But there's a long history of ADTs in other languages. For instance we don't see threads on recreating Haskell's ADT structures in Python.

Is this an artifact of Rust is hype right now, especially on HN? As in the typical reader is more familiar with Rust than Haskell, and thus "I want to do what I'm used to in Rust in Python" is more likely to resonate than "I want to do what I'm used to in Haskell in Python"?

At the end of the day it doesn't *really* matter as the underlying construct being modeled is the same. It's the translation layer that I'm wondering about.


(Author of the post.)

I think so, in the sense that Rust has successfully translated ADTs and other PLT-laden concepts from SML/Haskell into syntax that a large base of engineers finds intuitive. Whether or not that’s hype is a value judgement, but that is the reason I picked it for the example snippet: I figured more people would “get” it with less explanation required :-)


Apologies for my meta-meta-comment :) I've been writing code for ~30 years in various languages, and today my brain can't compute how people find any syntax other than this more intuitive:

  data Thing
    = ThingA Int
    | ThingB String Bool
    | ThingC
To me, the above syntax takes away all the noise and just states what needs to be stated.


Got it. It makes sense and was what I figured is the case. I find it interesting as it's a sign of the times watching the evolution of what the "base language" is in threads like this over time. I mentioned in another comment that several years ago it'd have been Haskell or Scala. If one went back further (before my time!) it'd probably have been in OCaml or something.


In my experience learning a bit of OCaml after Rust, and then looking at Haskell, the three aren't all that different in terms of the basics of how ADTs are declared and used, especially for the simpler cases.


Agreed. As a concept they're all the same thing.

Another way of phrasing my query is that given these are all basically ML-style constructs, why would the examples not be ML? And I was assuming the answer to that is "the sorts of people reading these blogs in 2024 are more familiar with Rust"


I think a second reason might be that translating OCaml/Haskell concepts to Python has that academic connotation to it. Rust also (thanks to PyO3) has more affinity to Python than the ML languages. I guess it isn't a surprise that this post has Python, C++, and Rust, all "commonly" used for Python libraries.


I think "hype" has some connotations that I wouldn't necessarily agree with, and I don't think it's as much "on HN" as "people who write Python," but I would agree that I would expect at this point more Python folks to be familiar with Rust than Haskell, and so that to be the reason, yes.


The reason I said hype is that it's a cycle here. If you go back 10 years every example *would* have been in Haskell. Or perhaps Scala. They were the cool languages of the era. And the topics here painted a picture that their use in the broader world was more common than they really were. And I say that as someone who used both Haskell & Scala in my day job at the time. HN would have you believe that I was the norm, but I very much was not.

That's not to say it's bad, or a problem. If it gets more people into these concepts that's great.


It is quite common to see people in Rust circles mentioning Rust being innovative for feature XYZ, that was initially in a ML variant, Ada, Eiffel, ....

I would say familarity, and lack of exposure to programming languages in general.


Nowhere in this post or in any Rust community post I'm aware of does anybody claim that sum types (or product types, or affine/linear types, etc.) are a Rust novelty.

As a stretch, I've seen Rust content where people claim that Rust has successfully popularized a handful of relatively obscure PLT concepts. But this is a much, much weaker claim than Rust innovating or inventing them outright, and it's one that's largely supported by the size of the Rust community versus the size of Haskell or even the largest ML variant communities.

(I say this as someone who wrote OCaml for a handful of years before I touched Rust.)


Where did I specially mentioned it was this post, and not in general?

Here is another common one, "It would be great a Rust like but with GC".


> Here is another common one, "It would be great a Rust like but with GC".

What in this phrase suggests or implies that Rust has innovated something that an earlier FP language actually did? Something that resembles Go's managed runtime but with Rust's sum types seems like a very reasonable thing to want, and doesn't exist per se without buying either into a very foreign syntax and thus a much smaller community and library ecosystem.

(Or as another phrasing: what is actually wrong with someone saying this? Insufficient credit given to other languages? Do people apply this standard to C with BCPL and ALGOL? I haven't seen them do so.)


> Something that resembles Go's managed runtime but with Rust's sum types ..... Or as another phrasing: what is actually wrong with someone saying this?

I don't think there's anything wrong per se. Although I do think it contributes to the sentiment that people may be ascribing things as being novel to Rust, even when not intended as in this case. To be fair, that's what sent me down the mental path earlier that prompted this subthread. And that's when I figured it was more a matter of being the implementation most likely to resonate with the audience.

And I don't think it's a matter of needing to give credit to other languages. But phrasing it like "Something with a managed runtime, but with sum types" is generic enough, unless there's something specific about either of those. For instance the phrasing I gave does exist in plenty of places, but perhaps "Something that resembles Go's managed runtime with sum types" perhaps does not. I don't know enough about Go to say that.

In other words, is there something specific about *Rust*'s sum types that one is after in this example? Or just the concept of sum types.


> In other words, is there something specific about Rust's sum types that one is after in this example? Or just the concept of sum types.

I think, concretely, it's the fact that Rust's syntax is more intuitive to the average engineer than ML or Haskell. Maybe that's a failure of SWE education! But generally speaking, it's easier to explain what Rust does to someone who has taken a year or two of Java, C, or C++ than to explain ML to them.


I agree and think you're right, to a point. But I would posit that a much higher percentage of devs than the typical HNer would expect would find the Rust syntax to be pretty arcane. Although I grant that they'd find Haskell to be *more* arcane for sure.

And that stopping point I think is where the perception of Rust's popularity on sites like HN is much higher than in the general public. And by that I mean people who at least grok, if not use, Rust and not people who like the idea of Rust.

For instance, keep in mind that even during the heyday of Scala here on HN the rest of the JVM world was complaining that Scala syntax was too arcane.


No particular disagreement there!


It implies completely lack of knowledge that something like that already exists, predating Rust by a few decades.

The contexts where it pops up, it is as if it would be yet to come, such language.

Speaking of C and BCPL, indeed we do, because many wrongly believe in this urban myth, that without them there was nothing else as high level systems programming languages, even though JOVIAL came to be in 1958, followed by ALGOL and PL dialects, Bootstrap CPL was never planned to be used beyond that purpose, and there was a rich research outside Bell Labs in systems programming in high level languages.

Instead we got stuck with something that 50 years later are still trying to fix, with Rust being part of the solution.


> It implies completely lack of knowledge that something like that already exists, predating Rust by a few decades.

I don't understand why you think this: we explain things all the time without presuming that the particular choice of explanation implies ignorance of a preceding concept. In high school physics, for example, you wouldn't assume that your teacher doesn't know who Ptolemy is because they start with Newton.

The value of an explanation is in its effectiveness, not a pedantic lineage of the underlying concept. The latter is interesting, at least to me, but I'm not going to bore my readers by walking them through 65 years of language evolution just to get back to the same basic concept that they're able to intuit immediately from a ~6 line code snippet.

(It's also condescending to do so: there's no evidence whatsoever that Rust's creators, maintainers, community, etc. aren't familiar with the history of PL development.)


For what it's worth, you're right. I saw the same thing happen with Go: everyone seems to think that Go invented static linking and gasp compiling executables, seemingly ignorant of the fact that we actually used to do that all the time, before bloated dynamic runtimes and massive virtual machines even existed. I don't trust software "experts" who don't know their history, because they usually don't know a lot of other important things, either.


It always has to be some moral thing with you people. What's "wrong" is that software practitioners who don't know their history are doomed to repeat it. It implies a lack of exposure to different parts of the field, and especially a lack of exposure to the theory. Someone who thinks Rust is an entirely new idea in computing probably has other massive gaps in their knowledge, and it follows the irrational pop culture this industry has cultivated where anything older than 18 months is bad, and anything newer than 18 months has never existed before and is the greatest thing since sliced bread.

Some of us are tired of cleaning up after the inevitable messes these developers leave behind.


What are you talking about? The “wrong” above is factual i.e. positive, not normative.

Please be a little bit more charitable with how you read comments. The core observation here is that “Rust is completely novel” is not actually something that Rust practitioners, including junior engineers, actually say. Nobody has said it in this thread, and nobody has even provided a single example of somebody saying it.


Is there any reason why you've singled out Rust as particularly notable here and not any of the many other languages with them? OCaml, Elm, F#, Scala, I think more recent versions of Java, Kotlin, Nim, TypeScript, and Swift all support ADTs. Python already supports them, albeit with very little runtime support. Rust doesn't particularly stand out in such a broad field of languages. They're so useful a language needs a good reason these days to not support them.


You're making the exact point that I was raising.


I'm sorry, I'm still completely confused where rust came from or what particular relevance it has to the conversation beyond the short segment in the article?

My point being—you see articles about ADTs involving non-rust languages all the time. Why single rust out?


FWIW I seem to often find myself reaching for Haskell-isms when writing Typscript or Scala. And I’ve never actually written production Haskell code! But so many concepts like this just map nicely. “Parse don’t validate”, “make illegal states unreprsentable”, etc - all those patterns.


Also known as Type Driven Development, a much better approach than the other TDD abbreviation.


Author of typedload here.

typedload does this without need to pass a "discriminator" parameter.

Just having the types with the same field defined as a literal of different things will suffice.

I've also implemented an algorithm to inspect the data and find out the type directly from the literal field, to avoid having to try multiple types when loading a union. Pydantic has also implemented the same strategy afterwards.

typedload is faster than pydantic to load tagged unions. It is written in pure python.

edit: Also, typedload just uses completely regular dataclasses or attrs. No need for all those different BaseModel, RootModel and understanding when to use them.


A pretty good article. Would be a great article if they used real world examples instead of made up "formulated" ones.


I know that Foo and Frobulator and so on have history in code examples, but I personally find examples with them require more careful reading than examples built on real concepts.


I teach for a living. Simple examples are fine for introducing a concept but students really grok it when they can see it in practice.


Something I've learned is that in general, people find it easier to follow concrete examples than abstract ones.

I agree with you that the article would have been improved if they'd used real-world examples, e.g. a ContactMethod type that has Address or PhoneNumber or something like that.


It's a shame there's so many different names for a set of very related (or identical?) concepts. For example wikipedia says "tagged union" is also known as "variant, variant record, choice type, discriminated union, disjoint union, sum type, or coproduct". [https://en.wikipedia.org/wiki/Tagged_union]


Discriminated unions are also a wonderful part of the zod library. I use them to overload endpoints for multiple relevant operations.


+1 for the discriminated unions


I used to use io-ts heavily but zod is my go to now - and it’s so ergonomic and easy for typescript newbies to pick up and grasp.


As an alternative to Pydantic, check out the wonderful mashumaro: https://github.com/Fatal1ty/mashumaro

I've also played around with writing my own dataclass/data conversion library: https://github.com/hexane360/pane


And people complain that typescript is crazy. I think we just need to acknowledge that typing is hard, especially with what mainstream languages give us.


python has been about expressing ideas. Even if the language doesn't support some of the concepts natively, it's useful to express it in python so it could be effectively transpiled into a language that does. This is what py2many needs from a curated subset of python with some enhancements as opposed to inventing a new language.

https://github.com/adsharma/adt contains a small enhancement for @sealed decorator from the excellent upstream repo.

https://github.com/py2many/py2many/blob/main/tests/cases/sea... https://github.com/py2many/py2many/blob/main/tests/expected/...


A slightly related discussion on Type Unions in C# from a week ago: https://news.ycombinator.com/item?id=41183240


I feel like with typescript and pydantic taking center stage it seems that the dynamic vs static typing debate finally comes to a close.

More and more Java seems to be not that bad after all.


Maybe it has gotten less bad but Java was the main counter-argument and foe to static typing: a poor and inexpressive type system coupled with significant verbosity, Java required a lot of buck and provided very little bang for it. And that is just the langage, ignoring the horrendous best practices.

I very much credit Java for having turned a generation away from static typing, dynamic typing did get buoyed by a combination of moore’s law and good press but could never have done it without Java having smothered the other side and being dreck.


Something went very wrong then, given Go's existence, making Java feel like a PhD level language.


Java never felt like phd level language except for a phd in sucking.

Go does have a poor type system, but it has nowhere near the verbosity of early aught Java: local type inference, free functions, any number of (public) types in the same file, closures, type definitions (terse and easy newtyping), iteration (if only for builtin types until 1.23), etc...

And that's without considering the cultural side of requiring two different implementations of every type (one interface and one impl) or XML-oriented programming for bindings "improved" by unreliable parsers going through method comments.


The PhD level remark is by comparison with Go's remarks on all programming languages that go beyond its lame type system, including Java's.

"The key point here is our programmers are Googlers, they’re not researchers. They’re typically, fairly young, fresh out of school, probably learned Java, maybe learned C or C++, probably learned Python. They’re not capable of understanding a brilliant language but we want to use them to build good software. So, the language that we give them has to be easy for them to understand and easy to adopt."

"It must be familiar, roughly C-like. Programmers working at Google are early in their careers and are most familiar with procedural languages, particularly from the C family. The need to get programmers productive quickly in a new language means that the language cannot be too radical."

We are all well aware of the blue collar goals of Java 1.0.

Now we are at Java 23 EA, where the distance to Go's type system is even greater.

We have Go's failure to learn from history of programming languages, ironically following Java's missteps on generics (reaching out to the same Haskell folks that helped with Pizza compiler), with warts of its own with magic string formats for timestamps, const iota dance for enumerations, magic types and tagged structs.

Had Rust become mature one or two years earlier, and most likely Docker and Kubernetes would have pivoted from Python and Java respectively into Rust instead, with Go's fate being the same as Limbo.


This is an interesting take. Go is great because it balances performance, expressiveness, ease of use, and bug prevention probably better than any other language. I’m very happy Go is the container-world language instead of Rust, rust is just a pain.


Go is great, because its authors happened to have a manager that allowed them to work on their side project to avoid using C++ at Google, and eventually it took off thanks to Docker and Kubernetes pivoting to Go, and their related success in the industry.

Had it not been the case, and it would have been as successful in the industry as Oberon-2 and Limbo, its two main influences, and (simplifying the actual historical facts) previous work from the authors.


It's never been a dynamic vs static typing. It was a bout the willingness to incur the cognitive, syntactic and effort overhead of static typing.

Copilots have removed the effort overhead. Python's conciseness limits the syntactic overhead. Lastly, the emphasis on primitives and simple types limits cognitive overhead.

Static typing is winning precisely because it avoids the issues of 2010s-era Java.


Next steps: immutable data structures and functional programming. #haskell #hereWeCome


> immutable data structures

Data pipelines & ML workflows are already pseudo immutable and pseudo functional.


It is now bad even as a typed language


It was always bad as a statically typed language, and it was a lot worse back then.


That looks crazy coming from statically-typed languages. So many hops, efforts, custom structures just to verify types. Come on, modern C# can do all of that out-of-the-box. You define a record/class for DTO and System.Text.Json will either convert it successfully or throw you an exception that will say exactly what the problem was and at what character/field. Combined with much more advanced IntelliSense, development comfort is so much better. But of course, whatever works for you.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: