I thought this was going to be about non-local type inference (ie. write down a bunch of type equations and iterate until fixed-point) as opposed to simple bottom-up inference. The former has benefits, but also disadvantages: sometimes obscure and unintended types can be inferred, and it makes features like "give me the type of this expression" difficult.
The author is talking specifically about inferring the type of a variable from the type of the LHS of its initialization. Opinions differ on how much you should use this (he points out Go, and Go is nothing if not opinionated) but there are clear benefits to having it as an option.
Suppose you have
foo(super_long_name(), x, y, z)
and you want to refactor it into
var bar = super_long_name()
foo(bar, x, y, z)
This is only a stylistic change, but if you have to write down variable types, you need a piece of information for the second that you don't need in the first. If the first was not unclear (just long), why should the second be?
The type name may also be difficult to write down. It might also be unenlightening. Knowing the type of a variable is not always helpful during normal reading (eg. the types of iterators in C++ or Rust).
Also in generic code (eg. templates or macros), you might just not know the type of some expression. I believe this was a major argument in favor of adding auto to C++.
> Knowing the type of a variable is not always helpful during normal reading (e.g. the types of iterators in C++ or Rust).
Not only that, but there are some types in Rust that you _can't_ even specify; closures specifically don't have concrete types that can be specified in a Rust program.
Is the problem really type inference? If it really was, you'd have to disallow chaining method calls, or using method calls inline as parameters to other methods. Every intermediate value would need to have its type annotated explicitly, not just named values.
I don't think that's an improvement. Languages like Java and C++ don't work that way, and they're the ones being decried for having to spend too much time writing out redundant types.
I think the problem is that people don't treat types as documentation. List them explicitly when it clarifies. When it doesn't, leave them out. A lack of documentation is better than useless documentation. Treat types the exact same way.
TIL that Java 10 added local-variable type inference, to which I say, about damn time. I indeed decry having to type (and read) Foo<Bar> baz = new Foo<Bar>(), and no, Foo<Bar> baz = new Foo<>() isn't much better. I tested a project a couple of years ago using Java 8, and the contrast with the magical inferential powers of lambdas was stark.
Maybe it's because I've spent so much time in dynamically typed languages, but I see var/auto as absolutely essential to making static typing palatable. I'm a big fan of static typing, but I would be much less so if I had to do it without inference/deduction.
While I agree with the desire for experimental data, you're assuming that the status quo[1] is the right answer. There's no evidence that it is -- the reason it wasn't mainstream until recently was mainly fear of going against the status quo.
ML and Haskell, for example, have basically had (local and hear-global) type inference since the 1980s.
[1] As of, say, 2009 or so where even local type inference was relatively rare in mainstream languages.
This is highly personal. Best would be to have an IDE that shows the concrete types if desired by the user. In VS you can hover over a variable but permanent display would be nice too
As a data point: I use Scala + IntelliJ and I've personally actually turned this off. If you have a half-way decent type system I tend to find that it just gets in the way -- at least if things have reasonable names.
(This is obviously a very personal preference and depends on whether it's a code base you're already familiar with, etc. etc.)
and then there's counterproductive documentation. It may have been correct at one time, but now it both both wastes your time and misleads you. This is why pushing as much documentation as possible into the type system is a win.
I quite like type inference when it's clear from context what the type will be, e.g.:
var xs = new List<Int>();
for (var x in xs) { ... }
I'm pretty happy for it when the type "doesn't matter" because you don't do anything with the value except pass it elsewhere.
// Presumably refactored from processSomething(x, y, z, getSomething()) because the line got too long.
var something = getSomething();
processSomething(x, y, z, something);
And sometimes the type can be sufficiently unwieldy to name, most famously in C++ taking advantage of templates, but also can show up in other languages such as Haskell that encourage this behavior.
Finally there's this comment:
> A weaker argument I have heard is that you have less code to type with type inference!
I'd say that I agree with it as stated, but I'd hope the people making that argument are actually trying to say "it's less code to read/be distracted by". IMO omitting the "less important" type information draws the reader's attention to the ones you did write, which are presumably the more interesting parts of the code.
Having said all that, I think my brain is a bit quirky because even in languages like Python I automatically/internally infer all the types based on the names of variables and roughly how/where they're used as I read through. Maybe it's an experience thing (or maybe I don't work with bad/differently-thinking developers), but I find that my first or second guess is right 99% of the time.
// Presumably refactored from processSomething(x, y, z, getSomething()) because the line got too long.
var something = getSomething();
processSomething(x, y, z, something);
Sigh, people will add extra lines, refactor and risk introducing bugs simply to avoid a text line longer than a punch card from 1965.
My terminal width is about 120 characters because my vision is poor enough that this is how many character fit on my monitor horizontally. I appreciate coding standards that limit the width of a line of code because it makes it considerably more accessible for me. Programmers with poor vision aren't that uncommon -- it's not completely out of the question that your team will hire one some day.
I use it as an indicator that I am either nested too deeply or my functions have too many parameters. Or just in general my code is too complex / hard to read after the fact.
Sometimes things call to break the soft limit that is fine, it's only a signal of potential problems, not an indicator of definite problems.
And fwiw, if you can't refactor like that without concern it's a language design failure. A pretty big one actually.
First, if "var" was such a huge problem then language like python would be totally unusable! The entire language doesn't have explicit static type definitions.
LHS type inference is a tool. Like literally anything else you can write code that makes things less maintainable using this tool. The languages that support LHS type inference all have style suggestions that you use it when the type is either not especially relevant or is clear from the expression RHS. People don't complain about type names when they are useful. People complain about type names when they are extraneous noise. That is when LHS type inference is useful.
Some of us do think this about Python, at least for serious work. Many, many Python programmers admit that code in it needs a lot more testing to make up for the lack of built-in type safety, and guard against runtime violation traps.
So, no, it's not ridiculous. Overstated, possibly. But I am less interested in Go than ten minutes ago.
Textual type annotations and static type checking tend to go hand in hand. There's no point in using type annotations if the language doesn't do anything to validate that they are true.
If there's some level of confidence you're missing when using python because of the explicit types, you'd be better off making up for it using asserts rather than additional tests.
I think his point is that Python developers often do wish that they could have type validation written into the language, rather than having to litter code with assertions.
You can't really discount the benefits of type inference if you're using the most basic form. Hindley-Milner style type inference feels 1000000x better and IMO doesn't hurt readability. What I find in real-life code is that the overall structure is way more important than types--I'm not a type checker when I'm reading code, so there's no reason for me to care about what type a function takes. Further, in languages with implicit type conversions, overloading, inheritance, etc. knowing types doesn't really give you much info.
The hardest code bases I've worked with have been very explicit with type info and that's done nothing for me. The examples here feel very contrived; very little of the code in a 100k LoC project is gonna use types like Maps and ints, especially if it's OOP. Knowing that something is an AbstractIterator does very little to help me if I don't know WTF an AbstractIterator is. And complex inheritance chains absolutely destroy my ability to find the method I'm looking for on these types.
Coming from functional programming, it really doesn't feel so different looking up a function vs. looking up a class.
Regardless, every language has some degree of inference, otherwise every function call would be:
let result : int = (f : int -> int)(x : int + y : int, 2 : int)
If your code looks like this:
f(g(x, y), a(b(c)))
then inference isn't really doing much to hurt you. And what I've found is that for any large-scale project you see a lot of type of code.
very little of the code in a 100k LoC project is gonna use types like Maps and ints
Why do you think that? Map is a super common Java type, which is what the example with the most Map in it is written in. Hard to see where 'complex inheritance chains' come into it, as well.
What I mean is that there's a lot more custom types in real world code, and custom types aren't gonna be something you're already familiar with when reading new code.
Complex inheritance chains come into this because it makes it hard to track down the code you're looking for. Like let's say we have Cat : Feline & Pet, Feline : Mammal, Mammal : Animal, Pet : Property, Animal : Alive. Now when I do cat.eat(x) it's impossible to predict where eat is defined. Knowing that it's a Cat didn't make it any easier to understand the implementation of eat.
I know what complex inheritance chains are, I'm still not sure how they have much to do with this article. Your comment to me reads like you're trying to forklift a generic critique of OOP on top of this rather slender, content-wise, post. There's not enough in the post to support it and the generic critique is still, well, generic.
OOP is a part of it, but what I'm trying to get across is that knowing the type of a variable doesn't make the code substantially easier to comprehend. In small examples it might, but as code scales up abstractions are much more confusing than lack of type annotations. Specifically, complex types can hide important type information far better than type inference.
I really like the distinction Crystal makes here, which is that values on the stack is usually inferred, whereas values on the heap is a lot more explicit (though inference can be applied to some level there too, in initializers). It makes it nice to work with while at the same time encouraging reasonable amounts of thinking about memory layout.
Not OP but I think hurt is correct. My understanding is that if you're passing return values directly into functions then there's nowhere to annotate the types of those values, but many (most?) people are perfectly okay with that even though you could argue that the types are being inferred (in that they're being left implicit, just as they are when you say:
A lot of real world code I encounter doesn't bother with temporary values for everything and giving them explicit types, which is essentially the equivalent of just using inference here. Thus in this case inference isn't doing anything negative because the code would look like that regardless.
One thing that's left out of here is compile time. In one Swift project which was making heavy use of generics, I was able to decrease the compile time by double digit percentages by declaring types explicitly in a few places.
Also I've noticed in experimenting with Rust lately that use of type inference in code examples slows down the learning process. Often times I have to look up a few method signatures to figure out exactly what's happening in a given code block.
That said, I can't imagine working in modern languages without type inference altogether. I tend to agree with the last argument that the author doubts: if I repeat redundant type information several times in a scenario where it should be obvious, that makes my code less clear.
"Type inference is pushed by many tools and languages, but there is little evidence that it is beneficial. I argue that it may hinder comprehension and increase cognitive load."
If you write C++ STL code or C# LINQ code or any other generics/template heavy code you probably will appreciate type inference as a huge improvement.
C# dev here. I use this literally every day, it saves me keystrokes, makes the code much more readable, and I almost never have to stop and think about what type has been inferred. When I do, a quick hover will tell me that. In the time it takes these "quite clever" academic folks to get funding and conduct research and tell us definitively whether or not this already-established technology is useful to us, millions of developers will have created a shedload of everyday business applications, partly aided by little productivity boosts such as this.
My biggest gripe with type inference is compile time performance (in regards to Swift especially).
When I'm writing hundreds of lines of frame layout code, I have to explicitly write `let transientValue:CGFloat = ...` hundreds of times or else the compiler quits on me saying that it was unable to infer the required type before timing out. The error makes sense if you think about what the compiler is doing (solving a system of equations with all of the number like built in types) but frankly it's unnecessary to begin with.
Compiler performance aside, it also makes working on code a lot heavier since the editor needs to be able to infer the types (something the full blown compiler struggles to do given all cores and tens of seconds!) before it can give sane autocomplete suggestions. I'm not sure if a better IDE than Xcode could handle this better but I've found the tools have forced me to be explicit just so I can get my work down more quickly.
Yeah, type inference shouldn't really take a huge amount of time like that - for example Rust's type inference is a pretty small part of the overall compile time (although it can take longer if you do some crazy stuff with traits).
In Swift's case it seems like they've tried to add very flexible type inference (leaving off lots of stuff) to a very flexible type system (with inheritance, overloading, protocols, etc.). This is something that academics have been aware for decades causes problems with type checking performance. Similar problems have caused the designers of Scala 3 to go back to the drawing board to redesign the type system from the ground up, with a clearer formal semantics.
I do know that the Swift team are working really hard to make on demand editing/compiling better, but I don't know how far that work has come. Hopefully that improves things over time!
One place where I've found inference to be absolutely essential is some of the more complex asynchronous Rust code. There are probably a handful of issues at play here, but the end result is that often it is very hard to even know the exact type of a value. Sometimes, even if I have a feeling of what it is, the exact type incantation is hard or impossible to come up with. But if I just want to use a value where it matches an appropriate trait, I don't actually care about the exact type.
Likewise in C++. And, in those cases, you really don't want to, and you know why you don't want to.
But type inference in Rust and in Haskell can get pretty wacky, with the type determined by code half a page away. The built-in limits in C++ seem well-advised; anyway it is very rare to hear anyone asking for more, or complaining about code over-depending on it.
It is very common, in C++, to drop in a modifier, like `auto* p = f()` for both clarity and insurance. Usually that is just the right amount of type information.
> But type inference in Rust and in Haskell can get pretty wacky, with the type determined by code half a page away.
Hence the general guideline (in Haskell at least) that all top-level definitions should have a type ascription. You can also add ascriptions locally if you have a large function that you think would benefit (readability-wise) from it.
Indeed. It's a smilar thing in Scala, and I actually find it kind of limiting because Scala (like Rust, AFAIK) doesn't have "where expressions" or "declare a function without types just because", so you're actually discouraged from making really local functions that just frobnicate-the-transmogrifier because every non-anonymous function requires a declaration.
(I still very much appreciate Scala for what it is. No real experience outside of toys with Rust.)
My typing speed is not the bottleneck for me either, but the double bookkeeping certainly is. It takes me from "problem solver" to "code writer", a context switch I gladly live without. Type inference helps with that.
I'll second that. Similar problem reading code. All that cruft reduces the signal to noise ratio. Most of the time when I'm reading code I don't care what the types are, just what it does at a high level. Without type inference, step one is always scan the code and screen out all the boilerplate to get at the actual logic. When a lot of the code is boilerplate I end up spending half my mental capacity just remembering the bits that actually do stuff. If it's really bad I'll even rewrite it in pseudocode in a comment. Interestingly, in swift my actual code and that pseudocode are almost identical. By contrast, obj-c, which some people claim is very readable because it is so verbose and Englishy, I find to be fairly unreadable. There's just so much noise that I'm parsing out and and ignoring like 80% of it. Even in swift some people still manage to write C style code that's half intermediate types and variable names which just obscure the functionality.
One huge benefit of (robust) type inference is type holes. That is the ability to leave parts of your code unwritten and have the compiler infer what the type of those unwritten parts should be, and in more advanced cases, fill in the gap with candidate code.
The latter is something that isn't in any languages I would consider production-ready (although it seems tantalizingly close), but the former is already quite useful depending on how much type inference your language is capable of. If paired with the ability to search for code by type, it makes exploring APIs extremely friendly. Even without that ability, I've found the ability for the compiler to tell me what type belongs in a given empty chunk of code to be extraordinarily helpful for those times that I'm programming in a top-down fashion (scaffolding first, then fill in the concrete implementation).
Not OP, but my understanding is that it's a common approach in Idris. Here's the creator of the language deriving the implementation of zipWith without writing any code: https://www.youtube.com/watch?v=X36ye-1x_HQ&t=1274
How exactly are you supposed to refactor the last example to make it more readable? You could extract parts of it into separate methods, but I think splitting nine lines of code into methods of four or so lines would increase cognitive load. You could also alias the types with using, but I'd argue that obscures the types just as much as var. Developers would have to hover over the declarations to see the real types either way.
This is the worst possible argument against a language feature. Hideous code is exactly the kind of code that is most difficult to read, and so exactly the kind of code where you want the most help from your language.
Welcome to the first lession on to how to sell your favorite framework and/or coding language;
Step 1 : Take a convoluted example code in another language/framework that you want to shit on, this example should break as many good coding practices as possible.
Step 2 : Show a simple example that adheres to good coding practices in your favorite language/framework.
The supposed advantage. Many perfectly productive languages don't even offer the option to specify the type at all, and not just in the inference case. As far as I'm aware, nobody has ever convincingly demonstrated (with empirical data) that this is worse along any metric that people care about (development speed, bugs, cost, etc.).
A lot of people have preferences on this front. Somehow, these preferences get promoted to best practices if not moral license. Just, which approach is claimed to be better really depends on who you ask, and then mainly on whichever language that person first cut their teeth on. And nobody's going to be able to cite any well-designed study that demonstrates their preference is actually superior, because none exist.
I agree with your examples, but that code is pretty trivial. There is a point where inference can be over used, and that can cause readability issues. The OP ended with a call for empirical research.
I think the reference to the refactoring in Visual Studio is a bit bogus in that it doesn't really imply anything about preference, you can swap between explicit and implicit. You can even specify which preference you prefer Implicit or Explicit typing for automatic code cleanup.
I've never really struggled with this in C# where nearly everything is implicitly typed. Never been a problem in non editor code views ( like github ) either. One of the major benefits, not mentioned, is when you change the type of something it doesn't cause needless code churn.
There are only a few things where I've seen it be a problem is for new people to programming where even remembering all the basic types is still a significant cognitive load and templated types look like unholy magic. They'd struggle to know what var sum = 1 + 2 + 3 would result in and really want to know.
The other, in C#, is where you end up with a IEnumerable instead of a List. This can result in some not so nice side effects
Visual Studio defaults to complaining about using var. The MSDN article I linked recommends not using var unless you have to. It seems quite clear what their preference is.
This is maybe slightly off-topic but one reason I prefer Java to C# when it comes to reading code outside an IDE is that it forces you to explicitly declare where every external class is being imported from, which makes it much easier to go looking for the definition if you need it, when you don't have access to the IDE's right-click "Go to Definition".
Many IDEs can automatically insert the type of a variable when developing. This seems preferable: have one person do this one action once, rather than every person reading the code in the future having to do work to decode the type.
I think if it avoids having to repeat the type then type inference helps. And especially that last Java example, there I feel it is useful:
for (var x : someMap.entrySet()) {
String key = x.getKey();
Entity value = x.getValue();
...
}
Here, the reader just needs look at the next two lines to understand the type of x. In the article, the key and value types were long and complicated, and they asked us to refactor. But in the above example, the types are very simple, yet still it's better than
for (Map.Entry<String, Entity> x : someMap.entrySet()) {
String key = x.getKey();
Entity value = x.getValue();
...
}
I feel in this example, the type in front of x doesn't help me, as a reader.
C# developer here, which has local variable type inference. It is appropriate in 95% of places. It hasn't caused an apocalypse yet. It has, in fact, been great!
This article tackles a very limited form of type inference that is far less useful than global type inference.
The benefit of type inference is that there is less to write, which makes changes quicker to make.
Type inference does not preclude adding extra annotations if they would help.
A good ide will show you type annotations at all times, if you choose. No small actions are required.
I would say that there actually is a problem with global type inference. If two types do not match (e.g. usage and implementation of a function) then most systems do a poor job of guessing which part (or both!) is broken.
At some point, if a module/pass/service is large enough, it becomes worth investing in a local naming convention that carries the relevant type information, and enforce that convention through code reviews. Module-specific hungarian notations, if you will.
I'm working on an inference pass in a compiler, and it's up to 2000+ lines of F# manipulating tuples/dimensions/tables/chains/indices, and identifiers referencing these.
1. each variable in the module can have one of these ten types, not immediately obvious from their names. While I can hover the variable and have the IDE tell you the type, it happens so often that it feels like hunting for an invisible cow.
2. a typical method will contain several variables related to the same concept. When inferring a "group by" statement, I will have an "origin tuple", from which I infer the "origin chain", from which I extract the current "origin table", and construct an "origin index", from which I discard duplicates to obtain an "unique index", which defines a new "unique table" for which it is the primary "unique dimension", and there is a corresponding foreign "origin dimension" on the "origin table". Some of these will also need to be indirected through identifiers.
In the end, there's a "hungarian.md" file in the module that explains that a `vec` suffix is a vector, `vecs` is a vector tuple, `vid` is a vector identifier, and so on.
Interesting problem! So much of 'reconstructing bits of programs' is about satisfying the constraints of a fixed ascii representation of code. I am hoping that structured editing might be able to help with this, by allowing you to hide or show bits of code without relying on the type checker, but we need to have good ways of integrating with other tools for that to take off (like source control sites).
It depends. When I was writing almost purely functional programming code in Scala I was frequently cursing at type inference. Each explicit type hint helped me more easily understand what each next transform was doing.
More modern languages (Rust, Swift) have also gone and say "well, omitting types from arguments and return values of function signatures is a bad idea, so let's not do that."
One thing which I don't often see mentioned is that when you refactor (e.g. rename a type), with type inference less code is changed therefore fewer git diff "hunks". That means easier merging.
For example:
var thing = getThing();
processThing(thing);
If you rename the type that getThing() returns, for example from "Thing" to "TheThing", without type inference the above code has to change (the type of "thing" has changed); with type inference no characters in the source code are changed.
If somebody else makes a change to those lines, e.g. writes the whole block in an "if" statement indenting it, without type inference you now have a merge conflict you have to resolve. With type inference, merging works without manual conflict resolution.
This is a limitation due to the fact that the compiler and the editor are different software. The compiler knows the type, but your editor doesn't, so the burden is thus on the programmer to either know or annotate the type by themselves.
This is a dumb thing that every programmer just accepts. The tool should do this work for you.
I don’t accept this, which is why I use IDEs instead of text editors. The editor in a reasonable IDE absolutely knows the types. Or maybe I’m missing what you’re saying.
Somehow lack of types in source code isn't making programming in python harder for most people. And if you need to know the type of something your IDE should help with that. And yes, you should use IDE/language server in our age, because it makes you more productive.
The "auto" keyword is a life saver when dealing with templates/pointers etc in C++. It is a real time waster figuring out the correct syntax for declaring your identifier, when you know perfectly the type's semantics. This is probably more of a flaw of C++ than an argument for type inference though.
Not once have I wished for type inference in Java for instance. Just learn to use the damn IDE and its code completion feature.
The author is talking specifically about inferring the type of a variable from the type of the LHS of its initialization. Opinions differ on how much you should use this (he points out Go, and Go is nothing if not opinionated) but there are clear benefits to having it as an option.
Suppose you have
and you want to refactor it into This is only a stylistic change, but if you have to write down variable types, you need a piece of information for the second that you don't need in the first. If the first was not unclear (just long), why should the second be?The type name may also be difficult to write down. It might also be unenlightening. Knowing the type of a variable is not always helpful during normal reading (eg. the types of iterators in C++ or Rust).
Also in generic code (eg. templates or macros), you might just not know the type of some expression. I believe this was a major argument in favor of adding auto to C++.