Hacker News new | past | comments | ask | show | jobs | submit login
Don't return named tuples in new APIs (snarky.ca)
53 points by todsacerdoti 6 hours ago | hide | past | favorite | 43 comments





Another problem brought about by their design being backwards-compatible with tuples is that you get wonky equality rules where two namedtuples of different types and with differently-named attributes can compare as equal:

    >>> Foo = namedtuple("Foo", ["bar"])
    >>> Baz = namedtuple("Baz", ["qux"])
    >>> Foo(bar="hello") == Baz(qux="hello")
    True
This also happens with the "new-style" namedtuples (typing.NamedTuple).

I like the convenience of namedtuples but I agree with the author: there are enough footguns to prefer other approaches.


This seems like an invented problem that never comes up in practice. It is no more interesting than numpy arrays evaluating as equal even when they conceptually not comparable:

    >>> temperatures_fahrenheit = np.array([10, 20, 30])
    >>> temperatures_celsius = np.array([10, 20, 30])
    >>> temperatures_fahrenheit == temperatures_celsius
    array([ True,  True,  True])

I've usually seen it come up when people try to hash the objects to use as dictionary keys or in sets, and then encounter very hard-to-troubleshoot issues later on. Obviously it's a bit weird to hash a bunch of objects of different types, but it's just one example of the footguns that namedtuples have and why I prefer other approaches.

The numpy equality thing is actually an enormous footgun, especially for people new to numeric Python. The equality for numpy and its derivates should have had the traditional Python meaning (yielding a bool), with the current operation (yielding a mask) should have been put under a named method.

Counterpoint: Named tuples are immutable, while dataclasses are mutable by default.

You can use frozen=true to "simulate" immutability, but that just overwrites the setter with a dummy implementation, something you (or your very clever coworker) can circumvent by using object.__setattr__()

So you neither get the performance benefits nor the invariants of actual immutability.


Counter-counterpoint:

- Everything in Python is mutable, including the definitions of constants like `3` and `True`. It's much like "unsafe" in Rust; you can do stupid things, but when you see somebody reaching for `__setattr__` or `ctypes` then you know to take out your magnifying glass on the PR, find a better solution, ban them from the repo, or start searching for a new job.

- Performance-wise, named tuples are sometimes better because more work happens in C for every line of Python, not because of any magic immutability benefits. It's similar to how you should prefer comprehensions to loops (most of the time) if you're stuck with Python but performance still matters a little bit. Yes, maybe still use named tuples for performance reasons, but don't give the credit to immutability.


> something you (or your very clever coworker) can circumvent by using object.__setattr__()

This fits pretty well with a lot of other stuff in Python (e.g. there’s no real private members in classes). There’s a bunch of escape hatches that you should avoid (but that can still be useful sometimes), and those usually are pretty obvious (e.g. if you see code using object.__setattr__, something is definitely not right).

Can’t tell whether this is good design or not, but personally I like it.


Counterpoint: I've used `object.__setattr__` pretty often when setting values in the `__post_init__` of frozen dataclasses

Is there a difference between global setattr(object, v) and object.__setattr__(v)? I've seen setattr() in the wild all over but I've never encountered the dunder one.

It's Python. You can override practically any behavior. Hell, use ctypes and mutate immutable tuples! Doing so is well-defined in the C API!

What bugs me more about frozen dataclasses is how post-init methods have to use the setattr hack.


Oh you mean Python library APIs. I totally thought this was going to be a generic article about APIs delivered over http, the first thing I'd think of when someone says API.

Yeah, having spent the last few years in REST world I sort of thought the same thing.

Author could have used NamedTuple instead of dataclass or TypedDict:

    from typing import NamedTuple

    class Point(NamedTuple):
        x: int
        y: int
        z: int

I don't see "don't use namedtuples in APIs" as a useful rule of thumb, to be honest. Ordered and iterable return-types make sense for a lot of APIs. Use them where it makes sense.

I feel like "consider dataclasses as a useful default" is decent advice.

You get the stuff you get from `NamedTuple`, but you can also easily add helper methods to the class as needed. And there are other dataclass goodies (though some things I find to be a bit anti-feature-y).


> I feel like "consider dataclasses as a useful default" is decent advice.

I agree.

> You get the stuff you get from `NamedTuple`, but you can also easily add helper methods to the class as needed. And there are other dataclass goodies (though some things I find to be a bit anti-feature-y).

I've seen examples where dataclasses were used when order matters, however, hence why I'm not comfortable with a general rule against namedtuples. Sometimes order and iterability matter, and dogmatically reaching for a different data type that doesn't preserve that information might be the wrong choice.


You can have methods with NamedTuple

Oh, with the class declaration version? I never considered that, but feels obvious now.

Author argues that named tuples are bad (it’s literally the article title) so I think you miss the point?

The author claims that the reason people reach for namedtuples is brevity, but I'd argue to the contrary and include the modern syntax for defining a namedtuple. The syntax is nearly identical to TypedDict and dataclass.

There are other reasons to reach for namedtuples, for example when order or iterability matter. I don't see a general rule of not using namedtuples in APIs to be that useful. Use the right tool when the need calls for it.


Right. It takes equal effort to define a named tuple, typed dict or dataclass.

Mostly people just reach for the tool that does what they want, named tuples for tuply stuff, typed dicts for mapping applications, and dataclasses when you actually want a class.


The author also argues for readability over semantics, so I'm not sure they got the point to begin with.

these can be more memory-efficient than classes or dictionaries.

there was a point a while back where python added __slots__ to classes to help with this; and in practice these days the largest systems are using numpy if they're in python at all

not sure what modern versions do. but in the olden days, if you were creating lots of small objects, tuples were a low-overhead way to do it


I think the best option for this, which is one listed in the article, is the dataclass. It's like a struct in C or Rust. It's ideal for structured data, which is, I believe, what a named tuple is intended for.

The annoyance of dataclasses, of course, is that they interact very awkwardly with immutability, which much of the Python ecosystem mandates (due to lacking value semantics).

But yes, they're still the least-bad choice.


Valid. One of my biggest (Perhaps my #1) fault with python is sloppy mutability and pass-by-value/reference rules.

Is there a situation where Python ever passes by value? Like you can sort of pretend for primitive types but I can't think of case where it's actually value.

Non-CPython implementations may pass by value as an optimization for certain immutable builtin types. This is visible to code using `is`.

(It's surprisingly difficult to implement a rigorous way to detect this vs compile-time constant evaluation though; note that identical objects of certain types are pooled already when generating/loading bytecode files. I don't think any current implementation is smart enough to optimize the following though)

  $ python3 -c 'o = object(); print(id(o) is id(o))'
  False
  $ pypy3 -c 'o = object(); print(id(o) is id(o))'
  True
  $ jython -c 'o = object(); print(id(o) is id(o))'
  True

Are you saying `o` is passed by value? I think this behavior is due to the return from `id()` being interned, or not. `id(o) == id(o)` will be true in all cases

I mean that the `id` function returns by value. It's not interning since that explicitly refers to something allocated, which isn't the case here.

> is that they interact very awkwardly with immutability

How so?

Tuples are awkward with immutability, if you put mutable things inside them.


By default, dataclasses can't be used as keys in a `dict`. You have to either use `frozen` (in which case the generated `__init__` becomes an abomination) or use `unsafe_hash` (in which case you have no guardrails).

In languages with value semantics, nothing about this problem even makes sense, since obviously a dict's key is taken by value if it needs to be stored.

--

Tuple behavior is sensible if you are familiar with discussion of reference semantics (though not as much as if you also support value semantics).

Still, at least we aren't Javascript where the question of using composite keys is answered with "screw you".


One advantage of (Named)Tuples over dataclasses or SimpleNamespaces is that they can be used as indices into numpy arrays, very useful when you API is returning a point or screen coordinates or similar.

This article seems vacuous to me. It misses the point that tuples are fundamental to the language with c-speed native support for packing, unpacking, hashing, pickling, slicing and equality tests. Tuples appear everywhere from the output of doctest, to time tuples, the result of divmod, the output of a csv reader and the output of a sqlite3 query.

Tuples are a core concept and fundamental data aggregation tool for Python. However, this post uses a trivial `Point()` class strawman to try to shoot down the idea of using tuples at all. IMO that is fighting the language and every existing API that either accepts tuple inputs or returns tuple outputs. That is a vast ecosystem.

According the glossary a named tuple "any type or class that inherits from tuple and whose indexable elements are also accessible using named attributes." Presumably, no one disputes that having names improves readability. So really this weak post argues against tuples themselves.


The beauty of Python is that it's so slow that you can relax and use what's clearest and most expensive. Finding yourself micro-optimizing things like tuple allocation time is a signal that you should be writing an extension or a numba snippet or something.

I think the core issue is about trust.

I trust that the maintainers of the Python language & the Python Standard Library are not going to change their tuple-using APIs in a breaking way, without a clear signal (like a major-version bump).

I do not extend that same trust to other Python projects. Maybe I extend that same trust to projects that demonstrate proper use of Semantic Versioning, but not to others.

Using something other than tuples trades some performance for some stability, which is a trade I’m OK with.


Data classes can gracefully replace tuples everywhere. Set frozen, then use a mixin or just author a getitem and iter magic, and you’re done.

They can't if you're using tuple unpacking.

    def __iter__(): 
      yield self.my_field
      yield self.my_other_field
Recreates tuple unpacking.

I feel like get mouse coordinates is a perfect time to return a named tuple though?

Yes. That was a positive case for NamedTuple. The negative case was what if the function needs to grow further and return more stuff and then its no longer clear what the return values are? For example, what if `get_mouse_coordinates()` becomes `get_peripheral_coordinates()` which, for some reason, needs to return the coordinates of all the peripherals as one flat namedtuple `NamedTuple(mouse_x: int, mouse_y: int, pointer_x: int, pointer_y: int, ...)`. I know its a contrived example but it can happen for other kinds of functions.

I think for the same reason you should avoid TypedDicts for new APIs as well. Dataclasses are the natural replacement for both.

Not really. A lot of tooling, JSON for example, naturally works with dictionaries. A TypedDict naturally connects with all those tools. In contrast, dataclasses are hostile to the enormous ecosystem of tools that work with dictionaries.

If you store all your data is dataclasses, you end-up having to either convert back to dictionaries or having to rebuild all that tooling. Python's abstract syntax trees are an example. If nodes had been represented with native dictionaries, then pprint would work right out the box. But with every node being its own class, a custom pretty printer is needed.

Dataclasses are cool but people should have a strong preference for Python's native types: list, tuple, dict, and set. Those work with just about everything. In contrast, a new dataclass is opaque and doesn't work with any existing tooling.


An advantage of dataclasses over dicts is that you can add methods and properties.

Also you can easily convert a dataclass to a dict with dataclasses.asdict. Not so easy to go from dict to dataclass though




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: