Hacker News new | past | comments | ask | show | jobs | submit login
A Failed Experiment with Python Type Annotations (mortoray.com)
127 points by ingve on June 11, 2019 | hide | past | favorite | 80 comments



> mypy has no trouble understanding this, but it’s unfortunately not valid Python code. You can’t refer to Node within the Node class.

No, the workaround is to stringify "Node".

    class Node(Object):
        def add_sub(self, sub: 'Node'):
            ...

        def get_subs(self) -> Sequence['Node']:  #Or maybe 'Sequence[Node]'
works just fine. As of python3.7, this stringification is done by automatically under a from future import, and will eventually become the default, so the original code will be valid.

> This complexity helped drive the introduction of the auto keyword to C++. There are many situations where writing the type information isn’t workable. This is especially true when dealing with parametric container classes,

This is absolutely wrong! You cannot annotate a function as returning `auto` in C++. `auto` and its variants in other languages are useful for eliding redundant type declarations. Especially long ones that use generics/templates. `auto foo = MyContainer<Tuple<int, str>>();` or whatever is nice than having to double write the type declaration. But, in C++ or java, if you write a function that returns a Mycontainer<Tuple<int, str>>, you have to write that in the function.

This is an intentional choice: function declarations are your apis, and explicit and clear APIs are useful for human readers of your code. That's why you should annotate the public apis[0] even when you can have them be type inferred. Speaking of which, if you want type inference, check out pytype[1], it's like MyPy, but does do type inference on unannotated code. But you should still annotate your public apis. It serves as a sanity check that you aren't accidentally returning something you don't expect.

And of course, being familiar with the differences between iterables, iterators, sequences, containers, etc. is not a bad idea.

[0]: https://google.github.io/styleguide/pyguide.html#3191-genera...

[1]: https://github.com/google/pytype


"[...]You cannot annotate a function as returning `auto` in C++. `auto` and its variants in other languages are useful for eliding redundant type declarations. Especially long ones that use generics/templates. `auto foo = MyContainer<Tuple<int, str>>();` or whatever is nice than having to double write the type declaration. But, in C++ or java, if you write a function that returns a Mycontainer<Tuple<int, str>>, you have to write that in the function.[...]"

Maybe I'm misunderstanding something, but one can write a function auto foo() in C++ and the return type is inferred. Lambda functions infer the return type by default.


Yuck, apparently I missed this happening. Its still (I think) true in java, where you can't have a function return `var`.


You're correct about Java. It was an explicit decision that the only local variable's type declarations and lambda parameters may be inferred with `var`.

This excludes, as you mentioned, the return types of a function, its parameter's types, as well as the types of fields in a class.


It’s only true if you have the function body available for deduction. Headers/decls need types.


There's a great 5m lightning talk from Pycon 2019 explaining the difference between pytype and mypy[0]. It's great that there's different libraries approaching the problem from different angles. Speaking of which, I'm also curious to see how Facebook's Pyre[1] differs from the former two. I'm surprise there's isn't a comprehensive comparison of the 3 libraries yet.

[0] https://www.youtube.com/watch?v=yFcCuinRVnU&t=38m25s

[1] https://github.com/facebook/pyre-check


Pyre is similar to mypy, in that it:

1. Is gradually typed,

2. Doesn't infer types, and

3. Is strict, in that it doesn't allow operations that change types.

It was originally developed as a replacement for mypy that was faster and scaled better to very large codebases.


> This is absolutely wrong! You cannot annotate a function as returning `auto` in C++.

You can do that since C++14.


> But you should still annotate your public apis.

The good thing about inference (especially with a REPL) is you can write it without the annotation, and then use the inferred type (in Haskell, I usually find that when I resist the temptation to explicitly annotate types, the actual types are more general than I would have specified.)


In Haskell, I usually find the quality of error messages to be much worse without top-level annotations.


I agree with that, too. My usual practice with Haskell is leave types off to leverage the information gained from type inference (with the intent of annotating signatures when I'm done), but then tell Haskell what I'm thinking the types should be if things break with impenetrable error messages.

But my coding in Haskell is pretty much personal and toy projects; I like the approach, but it may not be ideal for coding in anger.


This is generally true with pytype as well, but questionable whether it's a good thing.

Inferred, loose parameter types are great, because while I think it's good to know the differences between iterator, generator, iterable, sequence, container, and so on, it's difficult, and inference will give you the loosest one. But with return values, the inferred type will be the tighest, in other words your function will return a Dict (or worse, a defaultdict) instead of a Mapping or MutableMapping. You almost always want a Mapping.


As well as pytype, you can use a separate tool to discover the correct annotations, then verify them and add them to your source code. There are a couple - MonkeyType, pyannotate and pytypes.


I recently added type annotations to my Python 3.7 project and found it very helpful. Just the process of adding the annotations caught a handful of bugs!

Some things I found useful:

1. from __future__ import annotations

As I mentioned in another comment, this lets you write annotations naturally without worrying about when something gets defined.

2. typing.TYPE_CHECKING is only True while type checking.

This allows you to conditionally import files that would otherwise cause a circular dependency so that you can use the symbols for annotations.

3. def foo(bar: str = None) -> str

If an argument defaults to None, then mypy will recognize that its type is actually Optional[whatever]. So in the example above, bar is an Optional[str]. (I'm not sure if this is mypy-specific.) Optional[T] is equivalent to Union[T, None].

4. You can make typedefs easily.

MyType = Union[Sequence[str], Dict[str, int]]

Makes writing complicated annotations easier.


If your code doesn't need to be py 2/3 compatible it's fine to annotate using str. But if you do, typing.Text and bytes is better.


Am I missing something? Annotations aren't supported by Python 2 so if you're using them then you never need to worry about Python 2/3 compatibility.


Yes and no.

Annotations in comments or in type stubs are supported in python2 (you can look at typeshed for typing.Text and conditional py2/3 stuff). There's also some other cases, but they're...unique.


mypy supports comment-based annotations for Python 2. https://mypy.readthedocs.io/en/latest/python2.html


Good to know, but I hope to never need this.


The first issue is fixed by just using a string. Yes, you can't do Sequence[Node], but you can do Sequence["Node"].

The docs talk about this here ("class name forward references"): https://mypy.readthedocs.io/en/stable/kinds_of_types.html#cl...


For Python 3.7, you can do:

    from __future__ import annotations
This will defer the parsing of annotations until after the file has been parsed, allowing you to use the actual symbol instead of a string.


TIL!


OK, that works, but it does seem dumb, or even deliberately obtuse.

The language could just as well recognize Node as it recognizes 'Node', but instead it given as a burden for the human programmer to handle.

Is it maybe because mypy is not really part of the Python language, and can't really make changes to it?


This is not mypy, it's "python-the-language": the annotations are values, and, in this case, the method annotations are evaluated when encountered - which is before the end of the class. So you are trying to use a value still not defined. It's like doing something like

    def foo() -> Bar:
       ...

    class Bar:
       ...
obviously the python interpreter cannot find a "Bar" definition during the parsing of the foo function


You're saying this is how the language works.

I'm saying the language could work differently.

The case of referring to a class inside its definition is quite different from your "Bar" case. The thing referred to is already declared, it's just not fully defined yet.

There is no logical reason I can see making it impossible for Python to honor the obvious intention of the programmer here. It seems it has just chosen not to do so. Though I'll admit I haven't thought through every weird corner case :)


PEP 563 is intended to resolve it: https://www.python.org/dev/peps/pep-0563/

As rcfox said, you can already enable this behavior, and it will be included in Python 4.0.


The language will recognise it in a future version by default, or a from future import can be used now.

mypy isn't behind (here), it's ahead.


Great!


The second issue is a bit of a judgment call too, I think. If the types were inferred you could end up with errors that you're not able to see because inferred types end up compatible. Personally, I think I'd rather write them manually so that I'm validating whether the method actually matches what it was intended to return.

I can't find a more authoritative source right now, but I also remember seeing the same thing that this SO answer says - that it was a deliberate choice not to do this so that people could incrementally add type annotations: https://stackoverflow.com/a/38775381

There are also some separate tools that can infer and generate the annotations for you. I haven't tried them personally, but the mypy docs recommends MonkeyType: https://mypy.readthedocs.io/en/stable/existing_code.html#aut...


I really like using and writing type annotated Python, but every time I do, I wonder if I should just use a language where all the annotation time I put in gets me actual runtime improvements.


And accurate compile time checks.

I recommend checking F# if you want mostly inferred, strong static types and terse pythonesque Syntax (few braces, significant whitespace). Also much faster.

Considering you can use all of .NET from F# you can benefit from the work on C# as well.


I'll also wave the F# flag here, but there's also Scala if moving to Javaland is better for you than Dotnetland.


While the first complaint is just a reflect of the lack of good tutorials on mypy, the second one is very valid: I too wish for automatic type detection for obvious cases.

What's more, while I found mypy useful myself in several instances, it's still cumbersome to use:

- you must know to use the magic command line arguments to avoid mypy complaining about other libs and imports all the time

- you need to use List[], Dict[], Set[], Iteratble[], Tuple[], etc. instead of list[], dict[], set[], iter[], tuple[], etc., which means an import in almost every file and a very unnatural workflow.

- you need to use Union and instead of |. This is ridiculous. I have to do:

    from typing import Union, List

    def mask(...) -> List[Union[bool, int]]
Instead of:

    def mask(...) -> list[bool|int]
And Guido explicitly rejected the proposal for those on github despite the fact most imports are due to those.

Now, mypy has improved a lot. It's way faster, shows many more things that before, way less false positive, and has support for duck typing and dunder methods (named "protocol").

But it's sad to thing TypeScript is actually easier to use. Having a JS tech easier to use than a Python tech is a good sign we can improve things.


I would have to disagree with this post (first point is actually inaccurate and the second I could go either way on).

Type annotations/mypy, especially when coupled with dataclasses/pydantic has been very helpful in maintaining a rather substantial 3.7 codebase.


I agree. I'm currently working on a project containing 45k lines of Python code and we have really benefited from having type annotations.


> rather substantial 3.7 codebase

What size codebase is substantial?


Excluding tests/patches to third party libs, around 65K LOC


A rather sizable one.


Why would you keep Python if you want static typing? I mean, there are several modern, performant (more than Python actually) statically typed languages. Rust, Go, Kotlin, Scala, Swift… All of them are mature and have a good library ecosystem. So, except if you need some math/AI/ML packages available only for Python, why bother? Choose the right tool for the job, no?


In general, for a new project of any size, or significance, or where performance/concurrency was a concern; hands down, yes, I would absolutely reach for a strong, statically typed, compiled language.

The fact remains though that Python is an excellent general purpose language, easy to learn, almost universally known, has libraries for everything (and that’s just the stdlib). It is well suited to rapid prototyping, scripting, and data wrangling. Adding incremental typing makes it even better.


Scala: Python + dataclasses [+ mypy] offers a programming experience close to pragmatic Scala. The tradeoff becomes:

* worse performance

* lack of immutable vectors, alleviated via style conventions

* rarely, awkward lambdas

vs.

* wide pool of people familiar with the language

* lack of JVM lockin: Scala native is not there yet, the library ecosystem is all JVM.

* [good] batteries included

* much better reflection

* access to modern ML

* gradual typing

* no actors

Rust: Manual memory management is verbose and distracting.

Go: Manual exception handling is verbose and distracting.

Kotlin: JVM lock-in.

Swift: Apple lock-in.


No immutable vectors in Python? Are you not aware of tuples?

Lambdas in Python are quite limited, but offset by natural nested local functions.

Regarding performance, that's not always an issue, and depending on the use case can be addressed with async/await or using a multiprocessing pool. I have had issues with multiprocessing and some DB libs ib the past, but recently have had pretty good success.

Multithreading is still an issue because of the GIL (global interpreter lock) and really only useful if you're calling into a C/C++ lib that releases the lock while it does it's native things.

I've written a lot of Python in the last 15 years, and Ive written a ton of scripts that are faster than a well written identical C/C++/C# app, and was written in 1/10th of the time. It just depends on the use case.


Sorry, I meant persistent vectors. Data structures with type Vector[T] instead of Tuple[T0, T1, ...], amortized O(1) updates and list-like syntax for literals, e.g. i[x0, x1]. It's a minor annoyance, in most cases using vanilla Python lists with the convention 'any list that is part of a dataclass / is passed across function boundaries should never be mutated' is good enough. As u/joshuamorton noticed in this thread, the convention can be enforced with mypy and Sequence[T], which is great!


> * lack of immutable vectors, alleviated via style conventions

Note that mypy/typed python fixes this: if you annotate a function as returning (or accepting) a Sequence/Collection/Iterable instead of a List, and attempt to mutate it, you'll get type errors. This is good practice anyway.


Thanks for the tip!


> Swift: Apple lock-in.

Swift runs on Linux and has done since it was made open source in 2015, a year after its initial announcement. Windows support is being worked on but, for the time being, it runs fine in WSL. No lock in.


How's the cross-platform library ecosystem? My distant sentiment is that the bulk of libraries are tied to / provided by Apple, but I'd be happy to be proven wrong.


Cross-platform stuff is mainly non-GUI since the main thrust of the cross-platform scene seems to be server-side Swift, and most of the GUI stuff is targeted to iOS (although with Project Catalyst, that's all implicitly valid macOS code now).

I expect we may see some third-party, cross-platform reimplementations of SwiftUI very soon once the appropriate language features are set in stone and ship with the next version of Swift.

Otherwise, it's fairly healthy — lots of computational libraries, database, web, etc. Plus, Swift pretty much has direct compatibility with C and Python libraries without needing to wrap anything.


> Plus, Swift pretty much has direct compatibility with C and Python libraries without needing to wrap anything.

Could you please elaborate on this point?


C compatibility is one of the original design goals for Swift and is used extensively for the Linux version, for making Swift wrappers around existing libraries, and the Objective-C bridge on macOS. If using Xcode, one would use a bridging header; otherwise, one can use Swift Package Manager to wrap C libraries for use in Swift projects.

Swift recently gained the features needed to call Python code. You can either use the Python module from Swift for Tensorflow or the PythonKit library, both of which allow calling Python code from within Swift.

The features used to allow calling Python from Swift can also be used for other dynamic languages, so there may be the ability to call Ruby and other languages' libraries in future as well.


>Scala: JVM lock-in

>Kotlin: JVM lock-in

You know OpenJDK has been available for over a decade, right?


> Seems like all of this (adding static typing on top of Python, PHP, etc) started with the success of TypeScript

Mypy is a few years older than TypeScript, so, no.


Yep, already removed that part since I was not sure about the historic and it actually doesn't matter who started the trend.


> Choose the right tool for the job, no?

The thing is, static typing is never the job by itself.


"job" doesn't refer to static typing but the project you are working on.

- script of 2,000 LOC to clean data => dynamic typing is fine => Python

- web service of 1,000,000 LOC => static typing seems a better choice => Rust, Go, Kotlin, Scala, Swift, C#, Java…


A lot of people don't set out to write a million LOC web service, they build a smaller web service and it grows.


Quite. My company is using Go for most new projects, but we have five years of Python code that we can't justify rewriting any time soon. We've been able to add annotations incrementally instead.


Getting a whole team to try out static typing on a language that they already know is far easier than getting a whole team to learn an entirely new language.


Sometimes the right tool for the job is python + mypy.


Sometimes the job is "maintaining existing code," and nobody is going to pay for a ground up rewrite.


This is a really poor article with a click-bait headline. I was expecting this to be a case study where Python type annotations just couldn't work due to some interesting specifics of the projects. Instead, it's some petty gripes about the syntax of annotations not being just to the author's liking.


The first point can be rather easily circumvented by using strings as forward references and it only takes a bit of googling to figure that out. The second doesn't seem remotely like a deal breaker to me. Half the reason for having type annotations is to provide a description of what a function or method does at a glance. This is idiomatic even in hardcore functional languages like Haskell. With return type inference, you don't get that benefit.


It seems to me that type inference would be silly in this case. Type suggestion during mypy checks might be useful though.

Function x does not specify a type, but returns SomeHelpfulTypeString. We recommend you use that.

Put it behind a flag and call it a day. Maybe even have a way of requesting type suggestions for functions during development


I think the premise of this blog post is misguided.

Yes, there are two things about introducing types that the author finds annoying. So don't do those two things, and use type annotations for all the rest.

The type annotation system and mypy are designed to give you benefits even if you don't fully annotate everything -- that's the starting point of nearly all existing python code bases.


Other people have mentioned that the self-referential case from this post is doable. But what is not doable (to my knowledge) is this kind of self-reference:

    class Node(typing.NamedTuple):  # or dataclass
        child: 'Node'
The error is example.py:4: error: Recursive types not fully supported yet, nested types replaced with "Any".

This error has been in there for several years at least. And it always bites me when I forget. It's particularly cryptic when the self-reference is many levels deep. I've even had the error message break in these cases where it prints out a line number from the wrong file. So, then you have to hunt to see where the self-reference is hiding.


Just tried this myself, and it seems to only be an issue for typing.NamedTuple. Though, it still manages to do type checking correctly...

    from __future__ import annotations
    from typing import Optional, NamedTuple
    from dataclasses import dataclass

    class Foo:
        def __init__(self, foo: Foo = None) -> None:
            self.child: Optional[Foo] = foo

    Foo(Foo(Foo(None))) # Works
    Foo(Foo(Foo(1))) # error: Argument 1 to "Foo" has incompatible type "int"; expected "Optional[Foo]"

    @dataclass
    class Bar:
        child: Optional[Bar]

    Bar(Bar(Bar(None))) # Works
    Bar(Bar(Bar('bar'))) # error: Argument 1 to "Bar" has incompatible type "str"; expected "Optional[Bar]"

    class Hep(NamedTuple): # error: Recursive types not fully supported yet, nested types replaced with "Any"
        child: Optional[Hep]

    Hep(Hep(Hep(None))) # Works
    Hep(Hep(Hep({'a': 'b'}))) # error: Argument 1 to "Hep" has incompatible type "Dict[str, str]"; expected "Optional[Hep]"


It looks like the constructor is being typechecked, but probably not attribute access. `val.child.junk` will pass because `val.child` is `Any` and you can do any-thing to an Any.


That seems to work too..

    a = Foo(Foo(Foo(None)))
    b = Bar(Bar(Bar(None)))
    c = Hep(Hep(Hep(None)))

    a.child.junk = 1
    # error: Item "Foo" of "Optional[Foo]" has no attribute "junk"
    # error: Item "None" of "Optional[Foo]" has no attribute "junk"

    b.child.junk = 'a'
    # error: Item "Bar" of "Optional[Bar]" has no attribute "junk"
    # error: Item "None" of "Optional[Bar]" has no attribute "junk"

    c.child.junk = 1.1
    # error: Item "Hep" of "Optional[Hep]" has no attribute "junk"
    # error: Item "None" of "Optional[Hep]" has no attribute "junk"

    a.child = 1
    # error: Incompatible types in assignment (expression has type "int", variable has type "Optional[Foo]")

    b.child = 1
    # error: Incompatible types in assignment (expression has type "int", variable has type "Optional[Bar]")

    c.child = 1
    # error: Property "child" defined in "Hep" is read-only
    # error: Incompatible types in assignment (expression has type "int", variable has type "Optional[Hep]")
I'm using mypy 0.701 with Python 3.7.3.


Hey, thanks a lot for pointing this out! Yet another reason to use data classes vs. named tuples.

Looks like just plain class level attribute declaration also works:

    class Node:
        child: 'Node'
I wonder why `typing.NamedTuple` is unique as far as it not working. I know they don't use `eval` and templating to create it anymore. From looking at the code [0], they're using metaclass / __new__. But other than the fact that it uses metaclasses, I'm not sure why it'd have an error. Obviously there's cycle handling in mypy, otherwise none of the examples would work. If I use an example with a metaclass, that also typechecks. So, it's not metaclasses that trigger the error.

    class NodeBase(type):
        def __new__(cls, name, bases, attrs):
            return super().__new__(cls, name, bases, attrs)


    class Node(metaclass=NodeBase):
        child: 'Node'  # works
Edit: it looks like there is now work going into this and other cases where recursive types don't work [1][2]. For example: `Callback = typing.Callable[[str], Callback]` and `Foo = Union[str, List['Foo']]`.

[0] https://github.com/python/cpython/blob/3.8/Lib/typing.py#L15...

[1] https://github.com/python/mypy/issues/731

[2] https://github.com/python/mypy/issues/6204


I cannot stand typing in Python. It's never given me any saving grace in a project, only headaches.


What's the status with numpy types? I find two projects, https://github.com/machinalis/mypy-data/tree/master/numpy-my... and https://github.com/numpy/numpy-stubs but neither looks very complete and neither has been updated recently.


Try Kotlin. Syntax is similar to Python (but better), but it's actually just Java, with all the supporting ecosystem. Kotlin is just syntactic sugar.


> Try Kotlin. Syntax is similar to Python (but better)

Well, from [1], I'd say syntax is closer to C.

[1] https://kotlinlang.org/docs/reference/control-flow.html


What's possibly so cumbersome about defining return types?

It does seem a little embarrassing that the annotations don't even support circular references.


Circular references like that are called "recursive types" in formal type theory. They are one of those big issues that academics like to write papers about.

Even if you just want to write a practical interpreter, and choose to gloss over the issues, they will still come back in some disguised form and either by requiring some sort of implementation kludge, or just by creating weird edge cases.


And how is that an issue for type checking, would you elaborate?


Author of the subj didn’t even bother to Google that issue, which is solved by the way.


> Or do you have any idea what type get_closure returns?

It doesn't.

Because of the syntax error.


I feel baking type annotations on top of dynamic, late binding languages is fundamentally misunderstanding what they are about. Python might not go as far as say, smalltalk, but if you're picking a dynamic language just accept the runtime dynamism and don't program against it.


Or well, accept the dynamism and use type declarations as automated declarative teat coverage, at the very least.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: