I just tried taking the article's advice, and running mypy on some of my code (completely unannotated) to see what came up. Some interesting things have, already made 3 patches.
One comment to the article author: Tell people how to install mypy. I started by trying 'pip install mypy', then wondered why I didn't end up with a mypy executable (you need pip3 install mypy-lang, mypy is an unrelated package).
Honestly I haven't looked into MyPy a huge amount, and haven't used it yet. But from reading the docs on it I feel like it's taking the wrong approach. In my opinion, Python is all about duck typing, so having a static type checker that seems to focus on concrete types feels wrong to me.
The type here is way more specific than it needs to be. Not only will this work on any sequence type (anything that supports __iter__), the element type only needs to support __add__ and __radd__ (to work against the implicit seed value of sum).
I really feel like in order to get the best of duck typing, you need a good syntax for describing types by usage, not by explicit name. Please let me know if I'm missing something, and this is actually more expressive than the documentation suggests.
Well, I like to put that another way -- I think Python is all about protocols (in the Python sense of the word protocol). I think that is what most people mean when they say Python is all about duck typing, and it communicates the concept more precisely.
The concept of 'type' is overloaded with too many meanings these days. It started out as a what-storage-to-allocate and which-instruction-to-generate (int v float) concept (FORTRAN I, variables starting with I..N are ints, others real). Deep class hierarchies muddied the waters by overlaying the idea of whether or not a class implements a particular property or method, and how to locate the property or method that you want.
Python shows us that types and protocols/interfaces are two very different thing. In Python, asking 'isinstance()' is usually a bad idea unless you are doing something very low level, low level as in next stop is a C extension. The missing predicate in Python is 'hasprotocol()'. Abstract base classes can be used to accomplish that: isinstance(foo, SomeABC), but I think the intent could be more transparent with different syntax.
Yeah I agree with you completely, I think we just use different terminology to describe the same thing. It sounds like you use "type" to mean "concrete type", i.e. the thing it actually is, and not the thing it acts like, and then "protocol/interface" to describe the behaviour, the thing we can use it as. As somebody who mostly uses statically-typed languages, I use "type" to mean "the thing my static type checker knows this as", which could be a concrete type or simply an interface, and then obviously I use the term "concrete type" when I need to be more explicit.
I like the idea of a hasprotocol or supportsprotocol predicate, certainly over usage of isinstance, since one of the beauties of duck typing is that it works without requiring the type hierarchy. The tricky part will be in allowing you to define your own protocols, like "has these attributes/methods". An example would be the difference (or lack thereof) between classes and namedtuples.
Interestingly, I think that's what isinstance() actually does when it's called on abstract base classes like collections.abc.Iterable -- it looks to see whether the type supports __iter__ and whatnot. See the abc module and it's implementation here:
def __subclasshook__(cls, C):
if cls is Iterable:
if any("__iter__" in B.__dict__ for B in C.__mro__):
return True
return NotImplemented
Wow, hadn't seen this before. That's exactly what I mean. If I can define my own, e.g. Addable, which does a similar check for the presence of __add__ (and __radd__) and MyPy uses the subclasshook, then that'd be perfect.
> Python shows us that types and protocols/interfaces are two very different thing. In Python, asking 'isinstance()' is usually a bad idea
Interesting perspective. This would seem to be pretty well described by parametric polymorphism with constraints (Haskell-style polymorphism with typeclasses). In that context, too, we are often well served staying as generic as possible, and asking at runtime about the type of something is a bit of a code smell (and only possible where you have a Typeable constraint available).
Yes. Haskell's type deduction is exactly what I was thinking of as I read the article. You write a function using what you need and it deducts the requirements based on usage, giving you the most general type. I think this is actually very close to compile-time duck-typing. Actually I think if you took the class declarations out of Haskell and removed the instance lines (but left the subsequent implementations), and had the compiler deduce typeclasses for you, you'd get something that felt very much like compile-time duck-typing.
If you want an even closer example, it would be OCaml with its structurally typed object system with inferred types. For example:
# let f obj x y = (obj#m x)#n y;;
val f : < m : 'a -> < n : 'b -> 'c; .. >; .. > -> 'a -> 'b -> 'c = <fun>
Here I defined a function taking 3 arguments, called method m on obj, passing x as argument, then called method n on the result of that, passing y as argument.
Note the inferred type signature for this: the first argument is of type <m : 'a -> <n : 'b -> 'c; ..>; ..> - i.e. any object that has a method named m (and possibly some other unspecified members - that's what ".." means), with said method m having a signature that allows us to pass some 'a, and returning another object of type <n : 'b -> 'c; ..> - i.e. something with a method named n that accepts 'b and returns 'c. 'a and 'b' are then used as types of x and y, respectively, and 'c is the the result of f.
So effectively, this is full-fledged duck typing in the vein of Python objects, but captured on type system level.
I had heard of the support for the hierarchy of some builtins like floats being more general than ints, but this isn't a very satisfactory approach because it's not extensible to user-defined numeric types. Perhaps what I'm suggesting is possible with either the existing classes in abc, or by defining more like them (and ensuring users can define their own). Would be nice to see more focus on this in the examples if that's the case.
> I really feel like in order to get the best of duck typing, you need a good syntax for describing types by usage, not by explicit name.
Not sure if this is of interest to you, but the Nim[1] programming language offers a feature that allows you to do exactly what you described. It is called concepts and is documented here: http://nim-lang.org/docs/manual.html#generics-concepts
Yep that sounds like what I mean. Haskell typeclasses, Go interfaces, TypeScript interfaces are further examples. Not Java/C# interfaces though, as they need to be declared at design time, and you can't force existing classes to implement interfaces they haven't already declared themselves as implementing.
Yeah but as I said it can still go far more general than just int, float and complex. It could be any arbitrary class defined by somebody else that I could not possibly know about at design time, and so a whitelisting approach of concrete classes is simply not going to cut it. It would be nice to have a way to describe the actual requirements (only that the elements support __add__ and __radd__).
Well, my joke was that numbers.Number exists, and I think supports those operations, but even someone complaining about this topic is unaware that it exists.
I find it a bit easier to treat it as documentation, instead of literal types
mypy will then tell you if you used your documented code wrong, according to your documentation
also sum excludes summing of strings if I remember correctly, so its any object that has an __add__ and __radd__ and isn't a string or subclass of string. which is just a bit insane to try and express on one line!
Python's implementation of sum doesn't care about types, the function looks something like:
def sum(iterable):
r = 0
for v in iterable:
r += v
return v
it always starts with an integer, and since int.__add__(str) returns NotImplemented it tries str.__radd__(int) instead, which raises an AttributeError because str has no method __radd__, so no summing of strings and integers.
That's a fair point on the lack of support for strings by sum, but note this is only because the function is hard-coded explicitly to reject them and only because it's not the best way to build strings. By usage, strings should just work.
On the point of documentation, this is potentially a fair point if you consider that you probably wrote this function for a particular usage and you know at design time that you're really only ever going to give it a list of ints (if you strip the types away, the parameter is still called nums which suggests that's what was really intended). In reality, the function is actually much more powerful than that, and if included in a library, many people could use it for many more things than just that.
> In reality, the function is actually much more powerful than that, and if included in a library, many people could use it for many more things than just that.
I think all this says is that design and documentation considerations for functions in distributed libraries may be different than those for functions used locally. Something I certainly agree with, but not really a criticism of the parent point.
Fair point. I realise I sound quite critical in my comments so perhaps I should add that I like the tool itself. The problems that I see are more in the usage as shown in the examples. I am not necessarily suggesting that in all circumstances, the most general types are the best. But I am yet to see this tool used in such a way as to permit the most general types, as would be appropriate for library functions.
> if you strip the types away, the parameter is still called nums which suggests that's what was really intended
Correct and concise documentation is incredibly valuable.
For documentation to stay correct, it must be checked and updated as situations change. Types and assertions can both play this role. Tests can play this role.
Names also play this role - they are not mechanically checked, but "is my use here consistent with the name" is a check any programmer is applying as they work. This actually argues a bit against renaming refactor tools - if I am changing a name, I should be making sure that it squares with existing uses. From this perspective, checking of function parameter namess is weaker than checking of function names, since the names of the parameters typically do not appear at the use site - very different than the assignment of other variables.
You can't say it is more specific than it needs to be without knowing the author's intent AND the semantics of all the __add__ methods of all types that have these methods.
Which is for practical purposes impossible.
Semantically, the presence of __add__ and __radd__ is a minimum, not sufficient.
Well sure but that's just the danger of duck typing vs explicit interface implementation, something Python already suffers from. Without MyPy, Python will already let you use this function on all sequences of addable things, not just lists of ints. It would be good if MyPy reflected this as closely as possible, otherwise you're throwing away the advantages of Python's duck-typing.
Edit to add: To be clear, sure the result of calling this on any arbitrary sequence of addables might be semantically nonsense, but that's not the purpose of a type checker. It's only to verify type safety, i.e. you won't get runtime type errors if you give me at least this kind of thing.
Part of the issue is the name, which is generic enough that with the generic implementation, it probably also wants a maximally generic type.
A more constrained type throws away "the advantages of Python's duck-typing", but it also throws away disadvantages, and that can be precisely what you want.
Yes, this is a good point too which I certainly agree with. Sometimes by using the most general type, you actually permit more than you intended and you might restrict yourself from making refactorings to the function because it would restrict the types.
> might be semantically nonsense, but that's not the purpose of a type checker. It's only to verify type safety, i.e. you won't get runtime type errors if you give me at least this kind of thing.
We either disagree about the purpose of a type checker, or we disagree about what can constitute a "runtime type error". If a type checker can catch inconsistencies that would lead to my returning "semantic nonsense", at low enough cost, I absolutely want it to. Obviously it won't be able to catch all such inconsistencies in all cases.
I understand your point, and perhaps we don't actually disagree on this. As an example, in Haskell (which you have shown in another comment that you are familiar with), this function would perhaps best be written as
sum_and_stringify :: (Show a, Monoid a, Foldable t) => t a -> String
sum_and_stringify = show . foldl mappend mempty
In this case there are additional rules (semantics, though not in the language) which we associate with monoids like associativity that don't apply to all "addables" (e.g. strings), that might make us feel more comfortable being so general. I think really it's just that we have more expressive language to describe our expectations. Though there's nothing stopping you from making String an instance of Monoid, or most any type for that matter, and producing nonsense. It's not a type error, so I say it's (in this case) not the job of the type system. But there are more expressive type systems that could let you guarantee these additional properties like associativity and turn nonsense into type errors, and I for one am all for that.
In the case of Python, though, coming from no static type system or verification at all, I would only ask of this tool to turn runtime checks (type errors caused by arguments lacking __iter__ or __add__) into static checks. Anything on top of that would be fantastic, I agree with you on that, but I wouldn't ask for or expect it.
Finally, thanks for all your comments - I appreciate the discussion.
> Anything on top of that would be fantastic, I agree with you on that, but I wouldn't ask for or expect it.
Ah, we probably do agree more than it seemed. "The MyPy team has done its job even if that's not possible/reasonable" is a statement I'd endorse - especially in the short-to-medium term.
Discussing the Haskell...
String concatenation actually does associate; in fact String does form a Monoid over concatenation, and has an instance visible from Prelude - specifically the list instance, because Haskell strings are just `[Char]`.
The only really "nonsense" thing about using a string here that I see is that we wind up with quotes and escaping in the result when we `show` the string. If the purpose of this is "gather results and format for logging", that might be fine.
Numeric types don't have an instance of Monoid, because there are two to pick. You can pick them with newtypes, but here that's a bit awkward (`sum_and_stringify . map Sum`? feels redundant), and then the Show will give you something like "Sum { getSum = 3 }", which is probably not what you want either...
Using Num rather than Monoid probably has better ergonomics, and may be closer to intent anyway. That is to say:
sum_and_stringify :: (Show a, Num a, Foldable t) => t a -> String
sum_and_stringify = show . foldl (+) 0
Of course, we have a `sum` defined (basically, perhaps literally) `foldl (+) 0` so why even pull this out into a function? `(show . sum)` is actually substantially shorter than `sum_and_stringify` and no less clear.
I think the biggest issue here is that the example is so devoid of context (probably necessarily) that it's hard to say what good style would be. In actual code, a function with that name should probably have a very general signature - but we likely shouldn't be writing a function with that name. Why are we wanting to "sum and stringify" as one action in this way? Is that something that might change? Will the uses want to change together or are we just Huffman coding? With a domain-relevant name, we'd have a sense of what ways it makes sense to restrict the types.
> Finally, thanks for all your comments - I appreciate the discussion.
>String concatenation actually does associate;
Oh my goodness, my brain exploded apparently because I meant to say commute, although commutation is usually not something to be expected of monoids.
>Using Num rather than Monoid probably has better ergonomics, and may be closer to intent anyway.
You're probably right on this point, in all honesty. I think I'm over-generalising the original intent.
>With a domain-relevant name, we'd have a sense of what ways it makes sense to restrict the types
Yes, I think this is the classic helper function vs library function quandary. If you wrote it to help yourself right now, it's often good to be very specific with the name and signature. That way if later you need to add more functionality you can without changing the name or restricting the signature. If you wrote it as a library function, it's good to be generic, to let as many people use it as possible. And in this case there should never be a need to change the function, it does what it does. If you need something else you just create something else.
This resonates with some things I've found myself saying recently around misunderstandings of the DRY principle.
It's so catchy an acronym, and so seemingly self-explanatory, a lot of people don't realize it was originally framed in terms of knowledge, not syntax. So if syntax happens to be the same in N places, but in each of those places it is the way it is for a different reason, consolidating may not be "DRYing up the code". On the other hand, if I need to say "there's a text field here" in my markup and also in my logic, that could be more DRY even though the syntax looks nothing alike.
Tying this back here, we find as a corollary: when factoring out code we can ask ourselves "what piece of knowledge does this represent?" If it is knowledge about the language (this is a good way to sort things) it probably goes in a library, and should probably be very general. If it is knowledge about your domain (this is how we escape HTML), it probably goes in a domain-specific library, and it should probably be somewhat less general. If it is knowledge about your application, it probably goes in your application or an application-specific library, and should probably be pretty specific.
protocols (think Java interfaces) are becoming a thing for the type checking, so that's what you can use in places like this in the future. something like `Iterable[Addable]`
This is great provided that Addable is not just something hardcoded to know about builtins like int and float. If I create my own class that quacks like an Addable, then it needs to be supported. That's the essence of duck-typing.
Thanks for the suggestion. For the record, I'm a fairly happy C# for work, Haskell for hobby programmer, using Python as my go-to for quick-and-dirty scripts. My comments here are mostly based on my own thoughts on the static vs dynamic debate, in this case specifically how to keep the advantages of dynamic languages whilst absorbing some of the benefits of static checking. They have nothing to do with the best usage of MyPy in production-grade Python projects, of which I have zero experience.
I am glad to see that this is optional. We need languages that mix compile/parse time (static) and run time (dynamic) types, to mix what fits best to each situation.
Of course type checking can catch some problems. The more extremist elements of the static types proponents seldom seem to ask the cost of adding types, and making some otherwise generic/overloaded operations harder to implement (e.g. - excess clutter in some "wordy" type systems, particularly in the C lineage where declared identifier names follow the other verbiage). We need more tools that allow us to easily nail down fixed interfaces to avoid errors, but still allow flexible / dynamic / metaprogramming type code when needed, without having to jump through hoops in either case.
Don't be a static typing extremist! (or throw the baby out with the bath-water, either, I guess)
I dunno, I find static types to be a core design tool that are highly beneficial from the earliest stage of a project. They help you probe a problem space as you develop, and offload a great deal of cognitive burden to your tools. I actually find myself working a great deal slower on new dynamically typed projects because I have to juggle a great deal more context in my head, and am encumbered by larger test suites that are more difficult to change.
Note that I am referring to type systems with succinct syntax, type inference, structural typing and ADTs.
If you can't keep the "types" of your design in your head, your design might be broken. I have a real bad short-term memory and can probably juggle less things in my head than most programmers. Yet with good taste for design / decomposition that's not much of a problem. Most of my code now is structured as tables, which greatly simplifies many transformations. There is no pressing need for typing tables, and anyway most of these type systems really suck at "tables transformation algebra".
Here is a little list of drawbacks of static type systems with features like you describe (parametric types, ADTs...):
Type dependency hell (banana, monkey, jungle, etc), having to fight/convince the type inferencer to accept any but the simplest things, having to type many things twice.
Looking at Haskell, "almost everything is inferred" is a lie in my experience. You are encouraged to write code using higher-order and highly parametric functions. If you do, the error messages you'll get if you don't annotate are plain sick. Another problem is that the type system provides no convenient escape hatches, while it's not powerful enough for typical programs. Many "successful" example designs I've seen are either _really_ simplistic (to the point where it's obvious that the better design wasn't chosen simply because the type system makes it really inconvenient), or have some _really_ terrible internals hacking in it (which might be the largest part, be impossible to understand by most developers, and break with the next compiler version).
Looking at C++, a practical example is vector. It's a useful tool (automatic memory management), and very likely to be all over in your function signatures. This is very bad in terms of modularity / reusability, because your functions are not defined using reasonably general types (pointers to arrays).
Many of these very real pains just aren't there with Python code, which has better modularity at the cost of being a little sloppier.
These days I look at static types mostly as tools for generating fast machine code (I doubt that much more than C types is needed here) and as tools for enabling nice syntax (OOP protocols like in Python can do that too, but they are very dirty). These things however are only needed at the very optimized parts of your average project.
So I think a good way in many cases is to go the Python route, and check types dynamically at strategic points to assert value integrity and enable transition to highly optimized (with static types) routines.
Granted it's difficult to remember the type of the casual e-mail address or of a list of integers. In Haskell, where there are a thousand ways of encoding the most trivial things, that is.
In my own practice, most transformations don't involve relations over more than 5 variables with mostly obvious types. Yes, this is hard to use:
>Granted it's difficult to remember the type of the casual e-mail address or of a list of integers.
It would be a toy project that only (or mainly) has those as all the types you need to remember -- so that just proves my point.
Also the 'type' of an email address is an unsolved problem -- if we're talking type checking. It's not a string, and it's not solvable with regex either. So bad example all around.
You cannot do any kind of Domain Driven Design with dynamic typing.
In F# you can have a type email address and it will be impossible to pass an unverified string there.
It has to be a valid email address, and you will know it at compile time.
Good luck to try to achieve the same effect in python without writing tons and tons of useless tests just to check the type.
And if you can keep all the type of your system in your head I don't think you ever worked in a real system.
You could make a property in Python that validates it's given value and throws a ValueError if it's not an email. 6 lines of code max, with maybe two tests.
I have gone that route in Haskell and it was a mess. In the end you need to extract the string again (basically everytime you do something meaningful with it). All these pattern matches kill code readability/writeability.
Also, in Haskell one needs to encapsulate the real constructor in its own module to make sure the data is only constructed through a sanitizing constructor. That's quite some overhead (and makes pattern matching impossible).
What might be useful instead is first class "tags". In Perl6, you can subset types to values that match a condition (for example, strings that match an email regex (cough)). They are still strings so can be used unchanged with other functionality that does not need the tag. It seemed quite usable to me (applications are limited, though). However the creator said it's still expensive because they haven't figured out / implemented how to transfer invariants, so they are checked at each function boundary.
In Python, you can use duck typing: subclass str, it's a dirty hack but works very nicely.
The real world solution to the whole mess seems to me to be just to design a good dataflow into the system, so it's clear where you have a valid email address. Typically it's very easy: Just assert the invariant at the boundary to the external domain (for example, HTML form parsing). Now whenever there is a variable "email" in your program you "know" that it is a valid email address.
I’ve gone that route in Haskell and it works great. Yes, you need to extract the string again. Not an issue unless you’re golfing all your code. You also don’t need to hide the real constructor if it’s a type your module should be working with.
Subclassing str seems like the most horrible way ever to represent an e-mail address. I would definitely write a wrapper instead – exactly like Haskell.
In python nothing can guarantee at compile time that you are not passing a username variable in a method that accepts an email address.
You'll just notice at runtime when you realise that you have to rollback your last deployment because new users are unable to receive the activation email that is being sent to their username instead of their email.
If you’re talking about runtime type checks rather than mypy, well, they’re just verbose static types with less safety. (May as well just embrace the duck typing at that point by accessing email.local_part and letting the AttributeError propagate.)
You’re making out having to explicitly specify a type when type inference doesn’t do a satisfactory job as some kind of extremely painful thing, but it’s not. You just define the type of your object, which is something you had to know anyways; now it’s written down and people reading don’t have to guess at or remember what it is.
Annotating even the last bit of code that uses type class methods until the error message becomes clear is very cumbersome. For example, I can't say (traverse :: The List instance), but have to dig out the whole annoying signature, or go look for a more strategic value where the same information can be put in fewer characters. For another thing, it often duplicates the amount of lines needed (or more if you like to put in whitespace before types). It's definitely very annoying.
I'd be interested to hear more about Haskell projects where the type system causes bad designs. Maybe you could write a blog post about it or something?
I will think about it, but would probably not be comfortable with it.
For now a few words that I hope I won't regret later (apart from the things I mentioned): The bulk of invariants / assertions in many real-life programs can't be expressed with mainstream type systems at all. They are too specific or only come to life dynamically. Things like: This array of integers is sorted ascendingly, and that index is a valid index into this other array which points to a number between 48 and 58. Or this table's column over there contains only values that are also present in that other table that just happens to be present now (but must be expunged from working memory after the next subtask). Checking if an integer fits in 8 bits and then converting to a byte is no safer in Haskell than in any other language.
So much to put the usefulness of mainstream types for static correctness in relation.
Now much of the tinkering that is going on in Haskell projects seems to be shoehorning yet one more real-world invariant in another crazy type. Maybe the next brilliant idea works for one more toy case in the end. Great, but the cost/benefit ratio is just bad.
I think the most important thing in software architecture is modularity, because the most pressing need in software development is the need for change. If my project is very modular, it's sufficient to have clear entry points to each module with very simple data types in the signature. The internals can be a dirty hack and rewritten later, or the module thrown away when the idea turns out bad. Interfaces on the other hand can't be changed as easily since they affect much more code. So a good approach is making interfaces that accept tables, augmented by a textual description of invariants so the user easily "gets" the meaning of the data and can reshape/preprocess the data as needed. This enables isolated rewrites of modules. It also enables creative use of data in unforeseen ways. Clever types on the other hand have a very strong influence not only on the design of module interfaces but also on the dependent modules' implementations, because 1) it's more cumbersome to reshape complex structures in a way suitable for clean implementation, and 2) because type systems can't usually transfer invariants to reshaped representations (and after all the goal was to let the compiler handle the invariants, so reshaping is a no-go).
In short, complex types _kill_ modularity because they have far too much influence on data layout.
For real world programs, what Haskell is great at is statically finding trivial interface usage errors like, don't call this with a NULL pointer, or this parameter must be a functor instance (because the actual data might be too ugly to admit). Turns out that that's all nice and well, but the machinery is a huge cognitive and economic overhead, while running some python client code of a sane API once typically brings such usage errors to light just as well (profane things. TypeError raised. Seems it tried to iterate over an int, so you must have gotten the argument order wrong etc).
>For now a few words that I hope I won't regret later (apart from the things I mentioned): The bulk of invariants / assertions in many real-life programs can't be expressed with mainstream type systems at all.
Which is beside the point, since partial coverage of invariants is still very useful.
Having a small subset of invariants checked by the type system for sure is beneficial, however I was saying that the cost/benefit ratio is just not good enough (IMO of course: as a non Haskell expert, who is not a genius and has only spent 2-3 years learning it...)
> This array of integers is sorted ascendingly, and that index is a valid index into this other array which points to a number between 48 and 58
These are difficult to encode in a way that lets the compiler check the code that sorts or calculates an index.
They are trivial to encode in a way that simply tags the values. I still need to be concerned with making sure my sorting is correct, but that's a very isolated concern. I still find it very useful to know whether I've called a function on the array that depends on it being sorted, and that doesn't require any heavy lifting at all.
I haven't had good experiences with tagging. There are just too many "tags" (often one per line or group of lines).
They also have disadvantages in that they separate data which is only different by some aspects, but not most others. (For a simple example, the printing function is not interested in a "safe HTML" Tag. The effort needed for constant tagging and untagging (or using "Stringy" classes, or the cognitive overhead you need to pay if you found a clever way to remove the syntactical burden) is too big except in very few cases.
> There are just too many "tags" (often one per line or group of lines).
Where the burden outweighs the benefit, you always have the freedom of typing things too loosely - to be sure, without something like refinement types, there is a granularity you can easily try for that's just not worthwhile.
But that's a pretty different thing than "can't be expressed with mainstream type systems at all." For me, the questions are 1) how far does this type move information in its use through the program, and 2) what part of these are likely to change in ways such that I'll want to be notified about this piece of code here?
> Looking at C++, a practical example is vector. It's a useful tool (automatic memory management), and very likely to be all over in your function signatures. This is very bad in terms of modularity / reusability, because your functions are not defined using reasonably general types (pointers to arrays).
That's why you define your functions as taking generic Iterators instead, which generalize pointers to arrays. Or, in more modern style, generic containers.
You can wrap that in a type erasure layer (lookup any_iterator) if you want separate compilation.
> That's why you define your functions as taking generic Iterators instead, which generalize pointers to arrays. Or, in more modern style, generic containers.
Or, in yet more modern style as slices. Or, ...
The only thing I miss from pointers + indices is optional bounds checking. Otherwise, it's the perfect tool for productivity (because readability, writeability, portability, modularity).
> Most of my code now is structured as tables, which greatly simplifies many transformations. There is no pressing need for typing tables, and anyway most of these type systems really suck at "tables transformation algebra".
What do you mean by "tables"? I think of a database table of rows/columns, or a matrix.
Sorry to vent on you, specifically, but many people seem to have missed the point where I said I wanted them.
I just don't want to be forced into a jail cell 100% of the time. I want an escape hatch to do the needful once in a while to do something that the language designer never planned for.
A sufficiently good static type system won't make generic operations harder to implement. It can even make them easier to implement in terms of guiding you towards good generalization techniques.
You can also do full Python-style "duck typing" statically. It's called structural typing, with native support from languages like typescript, purescript, etc. It's used a lot in web dev because it works well with Javascript's object model.
> A sufficiently good static type system won't make generic operations harder to implement.
This is true for the JVM definition of 'generics', but it doesn't mean that arbitrary operations on a collection of types can be easily expressed with a given (static) type system. Run time type information + a dynamic type can ameliorate some of this, but that's shoehorning optional types into a static type system, and I don't think that's quite what you meant.
Think about a generic operation to diff two data structures: you need access to the members of the struct in a generic way. I don't think this is easily solved with static types.
Can you give a specific example where you think you need dynamic types? The diff operation you described sounds like it could be made generic pretty easily, but I may be misunderstanding the goal.
> Can you give a specific example where you think you need dynamic types? The diff operation you described sounds like it could be made generic pretty easily.
How do you figure? This would require macros or runtime type introspection.
A specific example might be "serialize this json type without having to annotate the fields with how to serialize them".
True, but "batshit conservatives" won't do that, or will mess themselves if you do that, rather than making some kind of "Object-Whatever-Mapping" model and annotations.
But that's what I would do (at least in Java) for this kind of transient transfer object, if left to my own devices.
"deriving" is compile time; "Generic" is runtime inspection of the shape of a value (in terms of constructor applications). Something (at least vaguely) similar to Generic exists in many statically typed languages - including (IIRC, IIUC) Java and Go.
> Something (at least vaguely) similar to Generic exists in many statically typed languages - including (IIRC, IIUC) Java and Go.
Sure—but it's in spite of static typing, not because of it. You certainly give up any guarantees of avoiding runtime errors by detecting typing issues at compile time.
Sure, it's orthogonal to static typing (I wouldn't quite say "in spite of"). I wasn't trying to weigh in on the general topic, just to clarify what was happening where.
Yeah, Java has kind of "choked the ecosystem" for the last decade. That monster needs to retire before the industry can move forward (the language, at least, the JVM seems to work well).
Java generics seem easy enough to instantiate and use, but the mess when you implement one really discourages people from trying, usually. And if you do, the result is often a poster child for FUGLY code. Try declaring a java 8 lambda parameter that works with collections some time, and using that in a few places - serious COBOL-fingers and eye-burn time. Even a Pascal style "type" or C style "typedef" mechanism would have greatly reduced clutter when trying to create code that implements (java) generic operations.
OCaml-style structural typing cannot handle deciding at runtime what fields an object has. Arguably it's best to know this statically, but dynamic language users make extensive use of the ability to dynamically add fields to an object.
Some time ago on HN there was Facebook exploit posted where it was possible to track your friends. All this was possible mainly because lacking real types the only stable interface becomes string and everything becomes stringly typed.
Static types are not only strings and numbers and static types are about something more than typed variables. Too many times have I run into situation when I call some API in dynamic language and have absolutely no idea what can I do with the thing being returned. In dynamic language I have to either read the docs (hoping they are good) or introspect in a REPL, IDE won't help. And In Python world, the wrapper library method I'm calling forwards kwargs to some underlying library and I have absolutely no idea what can I pass without getting familiar with the underlying library. Dynamically typed abstractions are very prone to leaking.
> We need languages that mix compile/parse time (static) and run time (dynamic) types, to mix what fits best to each situation.
C# has the dynamic keyword (not to be confused with type inference via the var keyword), which basically allows to break out to dynamic typing. I find this to be significantly saner than the other way around.
I think there's a false distinction between static/dynamic types. As if all static type checkers offered the same benefit.
IMO writing a type annotation, or knowing whether a type is an int or a string, is the of minimal help in pushing errors into the type system. A good type system you don't even have to write the type of things, it's inferred.
Better 'statically typed' languages allow you to push more errors into the type system and give you tools for creating stronger abstractions.
Somebody, somewhere, is waiting until runtime to check for a property, and they will burn in Hell for their sins!
Even if they don't do it very often, even if the IDE infers properties most of the time anyway, even if there's less code to have to be refactored, and most of the refactoring that could be automated is still automated.
I would rather use types most of the time for clarity and to help the compiler / editor help me, just not always.
I still think the way Common Lisp handled typing was very convenient: Typing was basically dynamic, but one could use static typing in specific places, and one could even tell the compiler if this was for speed or for safety.
I like the idea of a hybrid type system where statically typed and dynamically typed parts coexist, OTOH, I can imagine it being difficult to implement.
This is exactly what mypy allows, actually! (Except for speedups -- types currently have no effect on runtime behavior, and it turns out they wouldn't make a big difference anyway.) The hybrid type system concept is called "gradual typing", and it's actually powered by some somewhat-recent developments in programming language theory. See this seminal paper by Jeremy Siek in 2006 if you're interested in the details: http://scheme2006.cs.uchicago.edu/13-siek.pdf
That sounds sweet! It's been a while since I did anything in Python (at work, for various reasons, I mostly use Perl when I can, C# when I have to). But the next time I do something with Python, I'll make sure to check it out!
Languages never die. But companies stop making new projects with them - for some number of 9s percent of companies. E.g. - FOOGOLTRAN is a five-nine dead language, if 99.999% of new projects don't use it.
Lisp has an extremely strong type system for values (in fact every value is a structure that includes a tag that says its primary type) but zero type checking for variables (a variable can have as its symbol-value literally any value at any time), other than the type hints you provide (which by the spec the compiler is free to ignore). I think a lot of static type systems don't sufficiently appreciate that those are two different questions.
Somewhere on the Internet - I think on the c2.com wiki - there is a discussion on type systems that organizes them on two axes, static vs dynamic and strong vs weak. Static vs dynamic refers to how much the compiler/interpreter knows about the types being used before execution. Strong vs weak refers to how easy or difficult it is to subvert the type system, e.g. by casting pointers or using unions.
That discussion was an eye-opener for me regarding type systems; to this day, I cringe when somebody calls, e.g. Python or Javascript a "typeless" languange.
This is a well known distinction, talked about in many places[1]. Python would be dynamic and strongly typed, Perl would be dynamic and weakly typed, at least WRT numerical and string types, which is why it uses different operators to distinguish between those types of operations (compared to Python's use of + for for numerical addition and string concatenation. Perl could also be said to be strongly typed WRT scalars and arrays, so it depends on your point of view.
There is some controversy over the strong/weak terminology, IMO mostly because almost no language is clearly strong or weak based on the definitions generally used[2]. I find it useful to think about whether things are done implicity, or whether you need to specifically convert types. Perl will happily add "1" and "1" to get 2, because the + operator forces it to be interpreted as a numeric type. Python will not, and expects you to convert to a numeric type first. Of course, exceptions to the rule abound.
(0) Bash is strongly typed. Everything is a string, so there can't possibly be any type conversions, implicit or explicit.
(1) Any language with subtypes is weakly typed. For example, Haskell with Rank2Types or above: every lens is a traversal in a canonical way, and indeed the type checker lets you supply a lens in a context where a traversal is expected.
(2) Rust is even more weakly typed: boxes (as well as other user-defined pointer types) are implicitly converted to references, iterable objects are implicitly converted to iterators, etc.
There's no precise technical definition of "strongly typed", but AFAICT, the intuitive idea is that a "strongly typed" language prevents you, or at least makes it difficult to do things that aren't likely to make sense, like treating the same datum as a number and as a string in different places. This is independent of whether the language has implicit conversions.
Sorry, don't get me wrong, my point was just to illustrate the silliness of kbenson's definition using reductio ad absurdum. Haskell and Rust aren't weakly typed in any conceivable universe.
> (0) Bash is strongly typed. Everything is a string, so there can't possibly be any type conversions, implicit or explicit.
If it were true that everything is a string, then yes, you probably could say it's strongly typed. It's laughable, but it is. Some single celled organisms are animals, as well. Assembly is generally the strongest typed language you can get.
But it's not true. Bash supports associative arrays.
That said, I'm not sure if you can actually access the array container itself (I don't use bash for actual programming), so I'm not sure how it handles comparisons and operations between the types, if it's even possible. It may be Bash is strongly typed (I'm not sure), but we can't make that determination by assuming it has a single data type. Also, even if it was a single data type, there are interstitial types that exist for doing numeric operations, and I'm not sure how to consider those with respect to this concept.
> (1) Any language with subtypes is weakly typed. For example, Haskell with Rank2Types or above: every lens is a traversal in a canonical way, and indeed the type checker lets you supply a lens in a context where a traversal is expected.
>
> (2) Rust is even more weakly typed: boxes (as well as other user-defined pointer types) are implicitly converted to references, iterable objects are implicitly converted to iterators, etc.
You're making very strong assertions based on my comment when I made sure to to say "almost no language is clearly strong or weak based on the definitions generally used", and even provided examples where the same language could be considered both strongly and weakly typed, depending on what aspect you were referring to.
"Strong" and "weak" are themselves relative terms, so make no sense unless comparing to something else. That may be a shared reference language, or the perceived common level of languages with respect to what is being discussed. I replied to the sibling of my original comment with some info about how I think about this. It's rarely useful to think of a language as entirely strongly or weakly typed, but certain aspects of it may or may not be (again, relative to some benchmark).
> There's no precise technical definition of "strongly typed", but AFAICT, the intuitive idea is that a "strongly typed" language prevents you, or at least makes it difficult to do things that aren't likely to make sense, like treating the same datum as a number and as a string in different places. This is independent of whether the language has implicit conversions.
Well, treating the same datum as a number and a string in different places is exacerbated by implicit conversions, which allow that behavior to succeed. I wouldn't say it's independent, I would say one depends on the other.
Really, whether they are well defined terms or not, I think it's still still useful to use them with some disambiguating context, as they express a concept otherwise hard to convey (if it was easy to convey, they wouldn't be so ambiguous in the first place).
Nah, what's actually useful is to determine how much of your program specification can be enforced with a given type system. That's well defined, and even relatively easy to assess before you pick this or that language for a given project.
I agree that is useful, but it's far from the only useful thing, and I don't think it's entirely distinct from the point at hand. When dealing with subtopics, covariance and contravariance can be seen as a weakening of the type system to make certain things easier.
I think a case could be made that any subversion of requiring the exact type defined or requiring manual conversion is a weakening of the type system, for the definition of weakening that we are using. The important consideration is that neither stronger nor weaker are always necessarily better or worse.
I didn't say it's the only useful thing, but it's perfectly reasonable to measure the expressive power of a type system (or any other language feature, really) by how much it buys you.
> can be seen as a weakening
Covariance and contravariance support by no means "weaken" a type system - they allow you to express more statically, not less.
Well, is expression how we are measuring how "strong" a type system is? It's definitely one way to measure it, but I would rather describe those in terms of safety and versatility. If by "strong" in a type system we mean explicitly following what was spcifically defined, then allowing anything except the specified type can be viewed as a weakening of the type system, regardless of whether it's provably safe or allows more expressiveness. That is, I think strong and weak or orthogonal to to safety and expressiveness (even if usually stronger is safer. as with most things, anything taken to an extreme starts to lose it's useful attributes). I.e. stronger is not necessarily equivalent to better.
That's exactly what expressiveness is. If a type system is versatile enough, you can express what you need to express without bypassing type safety.
> anything except the specified type
Even in this sense, subtyping doesn't weaken a type system. The definition of subtyping is the subsumption rule: given `S <: T` and `x : S`, you can derive `x : T`. (TAPL, p. 182) For similar reasons, adding ML-style let polymorphism to a type system doesn't weaken it.
> strong and weak is orthogonal to safety and expressiveness
I wouldn't say so. Perl is expressive. That expression sometimes comes at a cost of safety, but not always. Safety and expressiveness are not necessarily related.
> Then what exactly is it?
The exactitude of requiring exactly what is expected, and not allowing any substitutions. I'll explain my reasoning below.
> Even in this sense, subtyping doesn't weaken a type system.
Subtyping doesn't even always make sense when talking about strong or weak types. Some languages don't even have subtypes, yet we can still talk about how strong or weak their typing is just by referring to how different types interact, and whether conversion between them is implicit or explicit. Again, I'll call on Perl, which is useful in that it's very, very far from the type systems you have been describing WRT Haskell and Rust (and many languages that use static typing and inheritance).
Perl has two types of typing, variable (data structure) typing, and value typing. Perl is fairly strongly typed when it comes to variable types. It's either a scalar (which can contain many things, including references to other types), an array, which contains scalars, or a hash, which contains scalars (I'll leave out typeglobs and filehandles since they are fairly specialized). Until a few years ago, you could not supply a scalar where a hash or array was expected (e.g., you could not push($foo,"string"), you could only push(@foo,"sting"); ). You could explicitly dereference a scalar containing an array reference to an array and push to that, but it was an explicit dereference which required a @ prefixed the scalar[1].
Perl is fairly weakly typed when it comes to values within scalars. It will attempt to automatically convert the type to the appropriate representation such that the operation makes sense. Thus, adding two strings containing numbers requires no explicit conversion. Concatenating two numbers a string doesn't either. Neither do any string or numeric comparisons, as the conversion is implicit within the operator, as it knows what type it expects. This is why the expressions "1" + "2" and 1 . 2 in Perl are completely unambiguous. The first adds 1 and 2, yielding 3, the second concatenates the strings "1" and "2", yielding "12". The original type presented doesn't matter, it's implicitly converted as needed.
Subtyping, in this case, is irrelevant, because it doesn't exist in this language for these types of typing. It's entirely valid to talk about strong and weak typing in both these aspects or Perl, but they are both different and neither really matches the typing you are referring to with "type systems", because this language functions at a different level. The only way I can see using strong or weak typing as a descriptor across many languages, is to define it as how exact the type of data you are supplying needs to be to match what is expected. That is "strong" has entirely to do with constraints and explicitness. Anything that allows implicit action, even using a subtype which is known to be safe without some indication that you are raising or lowering the type explicitly so it can be used, is a weakening of that constraint (even if it's provably safe. Again, we aren't saying it's strongly safe, just that it's strongly constrained by type).
This reasoning leads to one of your original examples questions:
> Rust is even more weakly typed: boxes (as well as other user-defined pointer types) are implicitly converted to references, iterable objects are implicitly converted to iterators, etc
Yes. In some aspects, Rust has weakened it's typing in very specific circumstances to allow for more expressive programming, by not requiring explicit conversion when it can be inferred by the compiler. In this respect, Rust is more "weakly" typed that it could be, but it's also no less safe, no less fast, and more expressive for doing so, which I'm sure we would both agree is a good trade-off to make in this instance.
I hope that explains my view, and my reasoning, sufficiently. Let me know anything about my reasoning that confuses you, and I'll try to explain better (and of course feel free to poke holes where you see them).
1: Perl experimented with allowing core functions that expected an array or hash to allow a scalar containing a reference to such to work without the dereference, but ended up backing out that change.
Sure, but I was talking about the expressiveness of the type system, not the term language. The expressiveness of a type system lies in how much it tells you about your program, without having to actually run it.
> Rust has weakened its typing
I was trying to use reductio ad absurdum to question the validity of your definitions. I expected "Haskell is weakly typed" and "Rust is weakly typed" to be self-evidently false propositions.
> Sure, but I was talking about the expressiveness of the type system
Expressiveness can be described as expressiveness. I'm not sure why we would need to use strong/weak as an alternative to that term. IMO, it makes the strong/weak terminology less useful by tying it to this concept, which is only really relevant in some languages.
> I was trying to use reductio ad absurdum to question the validity of your definitions. I expected "Haskell is weakly typed" and "Rust is weakly typed" to be self-evidently false propositions.
I'm not sure if this is implying that we disagree on something that you still see to be self evident, or it's meant to explain why you included that and you no longer think it's a position worth holding?
In any case, I think it's worth clarifying that I didn't call Rust weakly typed, I said it hda weakened it's type constraints in a select few cases to increase expressiveness, where it did not cost in safety. (Constraints because everywhere you put strong/weak I think you can append constraints to be more explicit). I don't think it's controversial to say "Rust weakened constraints the type system imposed in some cases where it could be done safely, such as in iterable objects implicitly being seen as iterators in some cases", as that's clearly a bit less constrained than it could be.
Expressiveness is always relative to what a system is meant to express. The purpose of a type system isn't to write your actual programs in it. You use a type system to make sure that programs written in another language (a term language) make sense. So a type system is expressive to the extent it accurately captures the distinction between programs that make sense and programs that don't. (Of course, Goedel tells us that no decidable type system can perfectly capture this distinction.)
As for strong/weak, I don't think the concept is very useful without a clear technical definition in the first place.
> As for strong/weak, I don't think the concept is very useful without a clear technical definition in the first place.
Well, it's definitely not very useful in our case. ;) In all seriousness, as I said previously, I think it's useful for conveying a mixed bag of things to people that aren't very well versed in the specifics. If the person knows something about type systems, you should naturally have more specific language to fall back on to express your intent better.
In any case, I think we are way past of this being worthwhile to continue. I think we've plumbed the depths of this topic, whether we agree or not.
Just because there is some controversy over it doesn't mean it's meaningless. In this case it just means that it's not as descriptive as we would hope, and it's better used to describe as aspect of a language, as they are often strongly typed in some aspects and weakly typed in others.
First, my post said 'essentially meaningless', which is to say 'possibly not completely, but for the most part, meaningless'.
Second, the context here is whether or not the type system of a language is strong or it is weak.
I stand by my assertion that describing type systems as strong or weak typing is essentially meaningless. It's so far from descriptive that all 'strongly typed' ever really means is 'I like this feature' and 'weakly typed' means 'I do not like this feature'.
There are alternative, accurate ways of talking about what people think they are trying to get at when they use the terms strong and weak in relation to languages. They should be used instead.
Bravo! So much hair-splitting and (deliberate?) misinterpretation of context on the internet. We need more counters done your way to keep discussions focused: Factual and to the point.
I vaguely remember reading that this did not originate on Lisp machines, but they supported this at the hardware level, making it a lot faster. (Like I said: "vaguely"! This is third- to fourth-hand knowledge, at best.)
Optional static typing is fine. But this approach to it is not good. The correctness of the type information is optional. PEP 484: "While these annotations are available at runtime through the usual __annotations__ attribute, no type checking happens at runtime." (Italics in original) That's just not right.
If you take that seriously, compilers can't use type information to optimize. It feels like Von Rossum is making life hard for the PyPy compiler project again. If they had hard type information, they could do serious optimizations. Guido's CPython still uses a CObject for everything internally, so it doesn't benefit much from this.
The direction that everybody else seems to be taking is that function parameters, structure fields, and globals are typed, but local types are inferred as much as possible. C++ (with "auto"), Go, and Rust all take that route. Type information on function parameters and structure fields is an important item of documentation; if the language doesn't support that, you need to put it in separate documentation or comments. But writing elaborate type declarations for local variables is no fun, and the compiler can usually infer that info anyway.
Frankly, each of your three paragraphs seems unrelated to each other, so I'm still not sure what your argument is.
1) Not type checking at runtime is exactly what people mean by static typing, because the definition of static typing means the checks are done before execution.
2) This does not make it harder to optimize, on the contrary if they generated type assertions it would considerable slow things down on the boundary of typed and untyped code.
3) C++, Go, Rust, Haskell they are fully statically typed, the only difference is they support varying levels type-inference which means you don't need to "type out the type" but it's really orthogonal to the whole optional/static/dynamic typing issue.
C++ and Go don't have type inference. All they do is set the type of new variables to whatever they're first initialized to. Type inference means figuring out the types of variables from how they're used. The litmus test for type inference is inferring function argument types.
Rust and Haskell have very similar type inference. The only major difference is that Rust doesn't infer type signatures, but that's a design decision, not a power issue.
> "Type inference refers to the automatic deduction of the data type of an expression in a programming language."
Depending on how simple "deduction" can be, that arguably includes every language that includes expressions.
And if the deduction is required to include usage information, it quite arguably excludes C++/Go style "var": a variable declaration is not an expression.
Generally, that Wikipedia article doesn't seem very good - check the talk page.
True, C++ and Go don't have a real type inference engine. They just compute the result type of an expression and use that to create the type of a variable in an assignment. However, this handles enough of the common cases to be quite useful.
(When you have a real type inference engine, you spend a lot of time trying to figure out what it did, or why it didn't do something.)
Nobody forces you to use type inference. If you think a particular piece of code is too tricky to understand without explicit type annotations, by all means use explicit type signatures.
That being said, I have never encountered a program that was hard to understand because types were inferred rather than explicitly annotated. I use explicit type annotations in two situations:
(0) As a compile-time analogue of printf debugging. Not exactly a joyous thing.
(1) At module boundaries, to control what modules expose to each other.
Perhaps the parent was trying to talk about "HM type inference" or something, but
"type deduction" is also a fine name for a subset of type-inference, and perhaps more explicit for C++ and Go.
Perhaps you'd be best up taking it up on wikipedia and primary sources, instead of arguing with random internet people when using their same definitions.
I disagree. Naming things is useful. "Type deduction" says clearly what happens, and is still distinct enough from "type inference" which is a different, more sophisticated concept. I've seen the term "type deduction" used many times in the context of C++ and Rust.
Rust has actual type inference. It just happens to be a local affair. For example, you can create a vector, even an empty one, without explicitly specific its element type. But, somewhere in the current function, you have to provide information about the element type, say, by inserting elements into the vector, or by using an element in a way that constrains its type.
The ability to name things is useful, but there are far more things out there than we can come up with good names for, so we have to decide wisely what things we name and what things we don't. Something like C++'s `auto` doesn't need a name of its own. It's just unidirectionally propagating type information from child to parent nodes in the AST. This is something even FORTRAN (when its name was all caps) and ALGOL implementations did during normal type checking.
Sorry, what are you talking about? Python runtime still does its existing dynamic type checking. You can still get a TypeError at runtime if you try hard enough.
Mypy is not about optimising anything, it's about static correctness guarantees. If you're developing PyPy, type hints shouldn't be making your life any better or worse.
Lastly, the 'direction' you describe that everyone else seems to be taking, is the exact same direction mypy has taken. Type annotations are only required at the method/function level.
Enforcing type hints would be impossible to implement without breaking a lot of existing Python code. If module A suddenly introduces them, module B that depends on it would have to suddenly make sure that all arguments it passes conform. If another module C depends on B and passes arguments that are in turn passed to module A, that basically means that the author of B is forced to add type hints to its code as well. In practice this would rapidly turn the whole language into a semi-statically typed one which (a) was never the intention and (b) would turn the whole ecosystem into a complete mess with incompatible typed and non-typed code unable to interact with each other.
If you add type hints to a function and it detects an error because it's called with the wrong type, it probably didn't work anyway.
I've run into that with functions that take NumPy arrays. You can pass a Python array, but "+" behaves very differently. For a NumPy array, it's element by element addition. For a Python array, it's concatenation. You want to catch that.
>It feels like Von Rossum is making life hard for the PyPy compiler project again.
Or perhaps, he realizes that to require these typechecks at runtime would either
- require a massive rewrite of the CPython interpreter
- cause a performance penalty in code that is statically typed (there are pure python libraries that allow this, and yes your code runs slower)
Further, it offers a relatively narrow amount of help to pypy, which has already come out and said that they have no interest in using annotations for efficiency gains, because they're a jit and they don't need them.
PyPy is written in RPython, which is Python with type annotations strict enough to allow hard precompilation to C. The type syntax is completely different from this new thing.[1] Only the built-in types are fully defined. There's "anytype", "sequence", and "object" to cover more general types without giving full details. This allows hard compilation of the stuff that compiles down to machine code, while dealing with user-defined object types more dynamically. That's one approach to typing Python.
Von Rossum on PyPy (2016): "Otherwise, there are specialized versions of Python like PyPy that are still around."
I don't think there's anything preventing a future version of Python from implementing an optional command line flag to enforce declared types at runtime. And maybe the breaking change that results in a Python 4 will be enabling that option by default.
It's really strange to compare Python to C++, Go, and Rust, given that all the latter are statically typed languages with some amount of type inference, while Python is dynamically typed. A more appropriate comparison would be to something like JavaScript, where TypeScript and Flow both implement the same exact approach - optional type annotations that are checked at compile time and not enforced at runtime. Experience in the JS ecosystem has shown that it is a very successful approach.
Well in the pypy case, couldn't this just be a toggle? As in, you as the user guarantee that your type annotations are correct and strict, and if it isn't, then pypy will just blow up in your face when type annotations aren't respected.
I think that's a fair trade off, and easily doable. Those who want it and respect it get the speedup, those who don't don't lose anything.
I assure you, it's not that simple. You either check types at runtime (big performance penalty, aka: C Python) or you don't because a compiler (or JIT) did the leg work.
If you have to double check types at runtime, you've already lost.
But that's why I say, there should be a pypy flag where you tell it to not double check at run time and that you've made sure your annotation is correct.
Now if it isn't correct, you'll probably get a segfault or some nasty error and it'll blow up.
> For the many programmers who work in large Python 2 codebases, the even more exciting news is that mypy has full support for type-checking Python 2 programs, scales to large Python codebases, and can substantially simplify the upgrade to Python 3.
Woah! Excellent news. Really cool that they found a way to use this opportunity to bolster Python 3 migration.
Does Sphinx (or any other automated-documentation tool) read the MyPy annotations, or would I have to type them in twice, in different syntaxes, if I wanted types in my docs + linting?
Or does type documentation not matter that much, once it's being enforced by a tool?
I am extremely excited about this. I already verify parts of my applications, but better support (especially in libraries) will make this that much more useful. It's an essential part of the checking pipeline (with pyflakes, pep8, McCabe, etc).
Started to use python recently, coming from Go, I found:
- the official docs are useless, return types are often not documented (wtf ?) and docs aren't structured and clean. Python documentation is a giant mess.
- mypy is a god sent I used it from day one to document my Python code, but it needs better integration with python doc tools, 3rd party linters and co.
I'm ok with optional type checking. A team can make it mandatory during continuous integration while not breaking legacy code, but I hope mypy will be directly integrated to Python in the future and not a 3rd party tool.
I sometimes have to write some python script. The documentation documents return (and expected) types, but in a manner that is a bit alien to C or C++ programmers. It is more talky. It is probably in dynamic nature of python.
Instead of a list of possible returns with some kind of prototypes you may be left with description: "The returned object is always a file-like object whose file attribute is the underlying true file object.".
See for example documentation for tempfile.NamedTemporaryFile [1]. This time possible returns are not in the separate paragraph as is in many other cases. You can't click anything expect TemporaryFile(), so you have to search for proper reference. The information is there, but instead of quick glance you have to parse and search more text.
> It has a far lower false positive rate than the commercial static analyzers I’ve used!
Hoo boy. That's not a good thing. A consistent type system doesn't have "false positives". Reading between the lines, this means "the type checker is incomplete and allows incorrect types."
What is the story here with constraints, polymorphism, etc?
I wish PEP 484 put more thought into backward compatibility with Python2.
Currently you need to import things like Callable, Tuple and these don't exist in Python2. It is especially odd since I am using MyPy comment syntax, so MyPy needs import statements even when all types are inside comments.
I like the way Haskell puts type information above the definition.
In Python 2 and also in Python 3.4 and earlier, you get Callable, Tuple, and the rest from the `typing` module on PyPI: https://pypi.python.org/pypi/typing
This is a direct backport of the same `typing` module that's in the stdlib starting in Python 3.5.
(I'm one of the mypy core developers and work at Dropbox. We're using PEP 484 and mypy on a lot of Python 2 code!)
Here's what I don't understand -- every discussion about static typing is full of people saying static typing is so good and useful. Everyone i talked to, everyone i work with, we all prefer static typing. And yet... Javascript and all that.
Javascript runs in the browser natively. It's primary advantage is that you don't need to install anything other than the browser. The reasoning behind node.js is that a lot of front end engineers no longer need to learn and use a seperate language for the server and maybe even can reuse libraries between the server and browser.
Javascript is the bash of the internet. It's not that great but it does have a "killerapp".
why do you think people are replacing Javascript with Typescript ? Ultimately, type resolution at compile time is better than an unmaintainble javascript.
Good question! There are separate capitalized types for List, Dict, etc provided as part of the `typing` module in order to let you specify element types. Types are all normal Python expressions, so if you wrote `list[int]` you'd get a runtime error, because `type` objects (i.e. the `list` class) aren't subscriptable.
I'll say python isn't any more dynamic than ruby and groovy, and they have fairly stable and usable static compile target release/ports [1]. Python has a few [2] attempts, but none of them is close to stable or really useful.
Didn't forget it, just didn't include it since to me, it's more like a totally different (and not pretty) language that tries to look like python. I should however include Nuitka [1] since it is fairly usable, it's just the generated version isn't much faster than its interpreted python version.
Why is Cython a totally different language? It's definitely a superset, what with all the custom syntax for type annotations, struct definitions etc. But you can ignore all that, and just stick to the pure Python subset.
Guess you can say Crystal is a totally different language as well. It's just to me, Crystal still has the same feel as ruby and Cython doesn't. Again, it's just my opinion, I won't argue if you feel otherwise.
You're arguing that dynamic functionality doesn't stop you being able to statically compile Ruby, giving Crystal as an example - a language which had to drop most of the dynamic functionality of Ruby to achieve better performance.
> groovy, and they have fairly stable and usable static compile target release
Citing some Groovy 2.x Release Notes to prove a claim of "stability" for Groovy is like quoting an advertisement for Parmolux washing powder to prove its "effectiveness" at washing out dirt.
Not many developers bother with Groovy's static compilation annotations. It's used mainly as a scripting language for testing Java classes, gluing together Java programs, as a build DSL in Gradle, and scripting in Grails. When static annotations are used, their use is typically small-scale.
[Cython] is a programming language that makes writing C extensions for the Python language as easy as Python itself. It aims to become a superset of the [Python] language which gives it high-level, object-oriented, functional, and dynamic programming. Its main feature on top of these is support for optional static type declarations as part of the language. The source code gets translated into optimized C/C++ code and compiled as Python extension modules. This allows for both very fast program execution and tight integration with external C libraries, while keeping up the high programmer productivity for which the Python language is well known.
The primary Python execution environment is commonly referred to as CPython, as it is written in C. Other major implementations use Java (Jython [Jython]), C# (IronPython [IronPython]) and Python itself (PyPy [PyPy]). Written in C, CPython has been conducive to wrapping many external libraries that interface through the C language. It has, however, remained non trivial to write the necessary glue code in C, especially for programmers who are more fluent in a high-level language like Python than in a close-to-the-metal language like C.
Originally based on the well-known Pyrex [Pyrex], the Cython project has approached this problem by means of a source code compiler that translates Python code to equivalent C code. This code is executed within the CPython runtime environment, but at the speed of compiled C and with the ability to call directly into C libraries. At the same time, it keeps the original interface of the Python source code, which makes it directly usable from Python code. These two-fold characteristics enable Cython’s two major use cases: extending the CPython interpreter with fast binary modules, and interfacing Python code with external C libraries.
While Cython can compile (most) regular Python code, the generated C code usually gains major (and sometime impressive) speed improvements from optional static type declarations for both Python and C types. These allow Cython to assign C semantics to parts of the code, and to translate them into very efficient C code. Type declarations can therefore be used for two purposes: for moving code sections from dynamic Python semantics into static-and-fast C semantics, but also for directly manipulating types defined in external libraries. Cython thus merges the two worlds into a very broadly applicable programming language.
def primes(int kmax):
cdef int n, k, i
cdef int p[1000]
result = []
if kmax > 1000:
kmax = 1000
k = 0
n = 2
while k < kmax:
i = 0
while i < k and n % p[i] != 0:
i = i + 1
if i == k:
p[k] = n
k = k + 1
result.append(n)
n = n + 1
return result
Type annotations and mypy are a great idea, sure. But IMO the low hanging fruit is from static analysis tools like pyflakes, prospector, bandit. Most projects don't even capitalize on these and your time is better spent remediating the findings from these and/or carving out the exceptions where necessary.
Does PEP 484 plan to allow runtime reification of types (ie type information during runtime via reflection aka Java/C#)? I made a cursory check of it and it is unclear but I would assume no.
I know some may disagree but I have found runtime reification of types fairly useful in the Java world (partly because Java does not have good code generators like OCaml/Haskell and partly because Java is not that expressive).
Mypy is great, but it still has some ways to go. For instance, the type checker is thrown off by descriptors (__get__ and __set__ are ignored). Further, the typing module needs some work. The types in the module are sufficient for annotating simple programs, but the more "meta" the programming becomes the more walls will pop up.
IMO adding static types is good step but it sort of misses the point of modern type systems.
The good thing about types is not that you can just annotate (or not with good inference not) and validate that some type is what you expect it to be. It's that with a proper type system you can develop better abstractions.
It bothers me when the constructs of the language are not used for things like this. Example, the syntax for a list of ints is `List[int]` as opposed to `list(int)` where list is an actual fn in python and so it makes more sense. Obviously list(int) has far different meaning, but List is a totally foreign thing in Python. I think it makes sense for a DSL to remain as close to the parent as possible.
Another example of this is most of the prismatic/schema-esque frameworks for python to do validation on datastructures. Instead of saying foo must be { "name": "String" } (where there is a framework-specific meaning behind the literal word "String") we should take advantage of the built-in types and say {"name": basestring}.
EDIT: Solved: Either just list or List[int] with import:
from typing import List
Thanks for all the help guys! By the way I took the example from the mypy page. A classical case of 'critical line missing from example code' (as perfectioned by Bjarne Stroustrup back when)
Yeah, you need to write `from typing import List` to get the generic List class that you can parameterize with an item type. As you pointed out, you can use the builtin lowercase-l list without importing anything, but that doesn't allow you to specify the item type.
Its basically the same syntax as Rust, which I honestly like. I mean, I didn't like the syntax to begin with (I have always been a stickler for how name: type introduces a superfluous colon over type name syntax) and using two glyphs for returns is horrible, but since several languages are now doing this, if you are going to do it, you might as well keep it consistent.
ddfisher up there is already talking about stuff like
----
It should work with the Python 3 syntax if you quote your type annotation. I.e.
def f() -> "Bar": ...
instead of
def f() -> Bar: ...
---
Yeah, you need to write `from typing import List` to get the generic List class that you can parameterize with an item type. As you pointed out, you can use the builtin lowercase-l list without importing anything, but that doesn't allow you to specify the item type.
---
imo, this is brutal. I'll just wait and see whether I'm proven right or wrong, but the extremely poor aesthetics of the whole thing bode badly.
...and so we risk shuffling even further away from what made Python great. Simple pragmatism. Now in addition to unicode hair tearout, xrange forcefeed, dict access strait jacket, print function dogmatism, and extraneous unused async library which was already implemented 14x elsewhere, we are moving towards strictly typed Python?? If you want all this stuff, surely there is C, Ocaml, Go, or Typescript, etc, all of which are much faster?
It is a common misconception that typing makes stuff more complicated and inconvenient. Some type systems are quiet bad (not powerful enough, so they do more harm than good), but that does not mean that the idea of typing itself is bad, but their implementation is. In my opinion Python kicks ass, because of world-class libraries for nearly anything, and not because it's dynamic (a broad term, I would say it's strongly typed) und you do not have to declare types (which can often be inferred).
You listed 4 specific languages, but the following things are each inconvenient in them:
- C: it's inconvenient to implement a quick prototype
- Ocaml: it's inconvenient that the stdlib is so small
- Go: it's inconvenient for scientific/numerical code
- TypeScript: it's inconvenient to run server-side
Obviously you can do everything in any language (Turing, blabla), but the question is, how much effort (time) does it cost.
Python rocks in all of them (edit: the big exception being frontend webdev), in particular for quick prototyping. I think mypy is an exceptional and fine contribution to the eco-system and I really appreciate their effort.
Being in the middle of a struggle with Javascript tooling (Javascript, ES5 v/s ES6, typescript, webpack v/s System.js, shims, pollyfils, npm downloading the internet, etc, etc) I can't but imagine that there should be a better way.
How much I would like Webassembly to be ready to open the gates to other languages on the frontend land. Anyone here has more information about webassembly / python on the browser? I would love to know what is the current state of the art of running python on the browser and contribute to make the dream come true.
I have on my radar Brython [0], py.js [1](seems to be abandoned) and pypy.js [2]
I don't know what "strictly typed Python" is, but the type checking discussed in this post is 1) completely optional and 2) static. You don't have to do it if you don't want to, and even if you do do it, it doesn't affect in any way Python's runtime behavior or flexibility. You can have a codebase that works fine yet at the same time fails mypy's checks.
Also, no, we are not moving towards "strictly typed Python", even if that were a thing. PEP 484, which laid out the standard type hints, included this line in bold under "Non-goals" [0]:
> It should also be emphasized that Python will remain a dynamically typed language, and the authors have no desire to ever make type hints mandatory, even by convention.
One comment to the article author: Tell people how to install mypy. I started by trying 'pip install mypy', then wondered why I didn't end up with a mypy executable (you need pip3 install mypy-lang, mypy is an unrelated package).