The following is based on conjecture, as I'm not old or well-read enough to be sure of this, but it seems to me that the original purpose of type systems got muddled up by the tremendous popularity, mostly in Windows systems, of "hungarian" variable naming style. When your "type system" consists of adding three letters to the beginning of a variable name, you don't have a way to make descriptive types. Java, in most of the ways I've seen it used, is essentially hungarian notation enforced as a language feature.
I realize that variable naming and type systems are two different things, but it seems many programmers never realized the point of type systems (expressing the type for the programmers sake) because they only ever saw it used to distinguish abstract primitives. For a long time, I had trouble understanding that hungarian notation wasn't a type system, because they seemed to do precisely the same thing - that's how limited my understanding of types was. The kind of style seen in this article was alien to me for a long time, but it was enlightening to realize that the type system is there to help me out, not just make me type a bunch of unnecessary crap.
I think what you're getting at is the distinction between this:
float x;
and this:
kilogram x;
The former, as you say, only tells you about the underlying representation. It says "this is a float, so the computer should store it in such and such a way." That's fine, but it doesn't tell us enough.
The latter example is far more useful. In my imaginary language, the declaration implicitly tells the computer to store the value as a float, because the kilogram type has been defined as such elsewhere. But that's not all it does! It tells us and the compiler that this float represents a real-world quantity measured in kilograms. It prevents us from mistakenly passing kilograms where a pounds were expected, or seconds where kilograms were expected.
On the Haskell front, you might be interested in the Dimensional library, which does just that. It also works elegantly with multiplying and dividing units. E.g. if you have a miles value and an hours value, you can divide to get a miles per hour value.
You can even take this a step further: using `float` instead of `kilogram` is leaking an implementation detail. It's (probably) too low-level of information to actually be useful.
Luckily, more and more programming languages are making creating these simple types easier. Haskell:
The issue with doing this of course, is that it's mostly useless to computation. You lose the ability to use mathematical operators because you're no longer an instance of Num, and even if you create the Num instance, or use -XGeneralizedNewtypeDeriving, you can't multiply a Kilogram by a Meter/Second for example, since the arguments to (*) must be of the same type. One would need to use a generic "Measure" type instead, where the unit is some metadata attached to it, and the Num instance implements the typechecking on units.
You're quite right about the limitations imposed when you make your own dimensional types like that. It turns out the problem isn't trivial. Which is why I prefer to use a library like Dimensional. With Dimensional, you get plenty of units out-of-the-box, you can define your own if needed, and you can perform arithmetic in a fairly natural way.
And unfortunately even with a "generic Measure type" you can't actually enforce anything at compile time. The "Num" typeclass was not very well designed (given the current facilities of the language, some of which didn't exist at the time Num was designed, so no aspersions cast at the designers!).
Right. An interface exposed with integers may wind up using floats internally, or vice versa; numeric algorithms may wind up doing arbitrary things to precision and accuracy; &c. "You are passing around a float" puts some bounds on what can be delivered but that isn't really sufficient information when it matters and is useless fluff when it doesn't.
"So you've got to typedef single-field structs instead."
Which, thankfully, don't add any runtime cost (as should be expected).
As an aside, you don't actually need to typedef them at that point, but it saves you having to write "struct ..." everywhere so it's typically worthwhile.
Fahrenheit is not easily addable given current realisation of functionality and, if I recall correctly, somewhere in source code there’s a comment in which author explicitly states he doesn’t support units which don’t linearly correspond to SI-based unit.
Exactly, which makes me wonder why the library can support Celsius but not Fahrenheit.
To say it in more detail: To transform between Celsius and Kelvin, you just translate. (That operation is supported in Dimensional.)
To transform between Fahrenheit and Celsius or Fahrenheit and Kelvin, you translate and scale.
We know that Dimensional can scale, because it supports things like miles to kilometers. We know it can translate, because it supports Celsius to Fahrenheit. Is it the combination of scaling and transforming that makes Fahrenheit impossible?
Yeah - this is exactly the kind of distinction I was trying to make. Reminds me of Abelson's famous quote, "Programs must be written for people to read, and only incidentally for machines to execute." That's an idea that's largely forgotten when it comes to type systems.
Thanks for the pointer - I'll be sure to check out the library as I dive into Haskell over the next few months.
I've had a long standing interest in this kind of semantic typing. Ironically, one of the things people hate about Java is that it has a deeply engrained culture of using semantic typing everywhere, except that it uses class "boxing" rather than just having something like the type system in Go (typedef on steroids) or something axiom based like Haskell.
So that's why so much Java code ends up looking like this:
In fact, it's not uncommon to see things like class Furbie extends FuzzyThing, so you end up with very specific types that don't necessary behave differently, but do add some supposed semantic precision to the code. Unfortunately, you don't get much benefit from it in Java, so it just ends up being a hassle, resulting in lots of programmer-forced unsafe casting and just as many runtime errors as you'd have got with simpler types. This is not a fault of the language, just poor implementation which rightly or wrongly appears to be idiomatic in Java.
The type system in, for example, Go is much simpler, and allows you to create semantic types that actually are the underlying type instead of creating a class wrapper (called boxing in Java, e.g., boxed int is class Integer, which is an object containing only an int), which means you can define a type like 'kilogram' that's really just a float, but the type system won't silently cast a 'kilogram' value to a float or vice-versa because it knows they are different things.
I haven't got into Haskell in much detail yet but it just basically takes that a few steps further and allows you to define the characteristics of each custom type based on axioms, that describe things like whether it's commutative or associative, the range of allowed values, etc.
This kind of axiomatic semantic typing is clearly a lot more powerful, but as yet it hasn't really found its way into mainstream languages yet. Go takes the view that it's too complicated for what is intended to be a low-level language. Rust looks interesting because it aims to be equally low level, but provide some of the same rich typing you'd find in Haskell. Unfortunately it's far from stable, but it'll be interesting to see where it ends up.
> Ironically, one of the things people hate about Java is that it has a deeply engrained culture of using semantic typing everywhere, except that it uses class "boxing" rather than just having something like the type system in Go (typedef on steroids) or something axiom based like Haskell.
I think that people hate the implementation and syntax associated with semantic type in Java, not the fact that semantic typing is widely used in Java.
I think both much of the recent golden age of dynamic languages and the more recent resurgent of cleaner, less-heavy-syntax statically typed languages has been motivated by the perception that static typing in Java (and similar languages) has too high a cost for the benefits it provides.
Yeah, I suppose my description of Java as compiler-enforced Hungarian notation isn't entirely fair. I often wonder if Java's problems in terms of bearability are as much syntactic as they are semantic.
As an alternative, check out how Go does this. It's a nice compromise that lets you declare kilogram to be a float, and when used alone it treats it as a separate type, but it doesn't get in the way when you use it in an expression.
That seems useful, but dangerous. If you specify (for some reason) both a centimeter and inches type, would you be able to mix them in expressions without warnings?
--------
Per [1] it seems that since both would be named types you wouldn't be able to assign one to the other without explicit conversions. If my reading is correct, then that addresses my concern. It's things like storing a `float` to a `inch` type that can happen without explicit conversion as long as `inch` has float as its base type.
They are separate types, not a type alias, but some conversions are automatic. The rule for assignability also affects comparisons and function calls (including expressions containing them), but apparently not all expressions like I thought?
In Haskell? I wouldn't do that. That improves readability, but it doesn't allow the compiler to enforce correct unit usage. If a Kilogram is just a synonym for a Double, you can pass it in where other meanings of Double are expected. E.g. you can pass Kilograms into a function that takes Pounds. Which you don't want to do.
The type system is what the compiler (or interpreter) knows about your program. Sophisticated type systems know a lot, and can tell you a lot. Sophisticated does not mean typing a lot.
When building code for the long term, you really want to focus on techniques that make it impossible to misuse an API, whenever it's possible to do that. This article nicely calls out one such technique.
Generic typing can be used to (try to) force the consumer of an API into correct usage. Judicious use of final and abstract in the land of Java can be used force/guide eventual overriders of an abstract class down the right path -- they'll get compilation errors if they don't at least implement the right methods.
Whenever you find yourself writing an assert method, take a beat and figure out if the type system could have turned that into a compilation error instead.
"The type system is what the compiler (or interpreter) knows about your program."
Compilers know a lot more than just what is represented in "the type system" (as it's typically thought of, anyway). Control flow (basic block analysis, &c) for instance. Which isn't really taking away from your larger point - certainly, things represented in the type system are things the compiler will know about.
"Whenever you find yourself writing an assert method, take a beat and figure out if the type system could have turned that into a compilation error instead."
Possibly with a _Static_assert (or static_assert) as if C11 (/C++11)! Obviously there are substantial limitations still, but it's great to be able to pull more to compile time cleanly.
This discussion reminds me of a 2005 article from Joel Spolsky called "Making Wrong Code Look Wrong". It has a similar point to this article: there are things we can do as programmers to make our code less error prone. It's definitely worth a read if you're interested in this topic and haven't read Joel's article before.
Type systems in languages make me less productive.
Unless I have to provide support, add features, scale my application, write bug free code, hire additional developers, or explain my thought process to anyone else.
Right, it's well known that type systems are essentially useless, except when you need to write high quality code, and do all those things you mentioned. Basically they're unnecessary (but only if you use the term unnecessary very literally).
> val player = new Player(new Vector2(100, 100), new Vector2(50, 50))
> No problems here. The first noticible crack in the system comes after a few weeks vacation away from this code. You come back, and you want to change the starting point of the player. Will you remember which Vector2 to change?
player = Player(position=Vector2(100, 100), size=Vector2(50, 50))
The problem is not typing here, it's lack of named arguments.
Named arguments by themselves do not fundamentally prevent data type errors when the underlying types are identical. Short variable names (v1, v2) or some other cosmetic ambiguity will deceptively compile with named arguments as the only defense.
Both of the following would compile successfully using named arguments but one of them is semantically wrong:
Player(pos = v1, size = v2);
Player(pos = v2, size = v1);
Using a compiler's static type checking enforces correct semantics more than named arguments. Of course, this only works if the programmer creates new differentiated types so that they are no longer identical (which was the crux of the article.)
i agree that in languages without compiler defined types, but with other tools such as named arguments, the class of errors described here can be to some extent avoided by making good use of named arguments. I'm gonna do this more often in my python code!
Dynamically typed languages are expressive, quick and fun.
But strict static typing is like a seat belt, it's annoying, but it just might save your life.
Also ceremony code is there for a reason; that reason is not to annoy you, it is to make sure that those who come after you know what it is that you have done and why.
> But strict static typing is like a seat belt, it's annoying, but it just might save your life.
The problem with saying "static typing" without further precision is on one axis it ranges from C where the steatbelt is made of paper to ATS or Idris where the "steabelt" is a zero-zero ejection seat, and on an other axis it ranges from Java where you have to braid the steatbelt from raw fibers any time you sit down to MLs or Haskell where the seatbelt magically appears around you.
> The problem with saying "static typing" without further precision is on one axis it ranges from C where the steatbelt is made of paper to ATS or Idris where the "steabelt" is a zero-zero ejection seat, and on an other axis it ranges from Java where you have to braid the steatbelt from raw fibers any time you sit down to MLs or Haskell where the seatbelt magically appears around you.
Regarding C, it is weakly typed so it doesn't care whether you feed it apples or habanero chillis, the static type system isn't for safety, it's for improving compilation speed (which was often a big headache in the 1970's).
It sticks with the methodology of C: if you make a mistake you pay for it dearly.
There is an entire lineage of real object-oriented languages with nominal types starting from the dynamically typed smalltalk (research on which where we learned H&M was horribly ill suited to OOP in the first place). Or if you prefer, just look at how you use your natural language to name things and reason about the world.
On the other hand, so many people have abandoned the most common forms of OOP. See for example the story from yesterday [1]. Many writers have written about how brittle object hierarchies are in anything nontrivial, [2] for example.
I will say though that newer models of OOP, focusing on interfaces and composability of behavior, seem to have quite a bit of steam left in them.
Don't count OOP as being dead yet, the FP stuff is useful (I use it myself), but we have our own better abstractions coming out, like having more fun with mixin-style inheritance. The FP + OOP crowd is pretty vocal, I'll give you that.
One language that enforces strong typing is Ada. It has compiler-time and run-time type checking. It also had many other features to make the code less error-prone and maintainable. It was used by the DoD and also used in critical applications, such as nuclear power plants. The early versions of Java reminded me very much of Ada.
As far as I know, there's no runtime type checking of Ada, but there is runtime value checking to ensure values are within the specified ranges/adhere to the specified predicates (a very cool feature; you can specify that a type is always even, and encountering an odd value will cause an exception).
Oh, there is runtime type checking, though usually it is optimized out at compile time: it is for subtypes (which is mostly range checks) and checks of tags for tagged types (usually in class wide subprograms). Though both of them are often optimized out.
I'm hoping programming grows up enough to remember languages like Ada. Dynamic languages are useful for "get something running quick, then work on getting it right". Ada makes you get it right up front, or it doesn't even compile. The former is good if you have to bang out a script this afternoon; the later is good if you have to maintain a code base for 10+ years, with multiple programmers, and some assurances about reliability.
The way these are implemented, the short answer is yes. There will be more memory pressure on the garbage collector in Scala. The long answer is, "mmmm maybe, you should test it" because due to escape analysis, eden generation collecting speed, JIT, etc. It can get very complicated very fast. It can also be very dependent on what you mean by "significant penalty" and "performance".
As an aside, Scala does allow for compile time only small types in the form of AnyVal's. They won't help in the examples in the article, but in the simpler kilogram wrapper around float it will and there will be no performance penalty as it will be elided by the compiler.
It depends on the implementation. I'm not familiar with Scala, but in Haskell, using `newtype`s to achieve this creates absolutely no extra runtime overhead, only compile-time overhead when the types are checked.
which should be a no-op at runtime because the representation of Foo x and x are the same, but due to the newtype, the traversal does need to occur. [1] discusses a way to avoid these problems. Basically these newtype should only ever be needed during type checking, and should really be erased after that; once you've type checked everything, then you know your program won't ever treat something which is a non Foo Int as an Int, so you can then get rid of the compile time tag and gain better optimisations.
As a general rule, types are dealt with at compile time, and will only influence the optimizer leading to faster code...
But as people already pointed, in practice compilers are never optimum. When it's important (that is, rarely) it's good to test, or at least read the compiler's manual.
I came across this a few weeks ago, I think the first example is to do with arrays being second class.
On the other hand, if you give data to a C-programmer, they can locate its position in RAM. Give an assembly-programmer a simple data-type, they can point to a specific register(s) in the processor.
The general principal makes a ton of sense: let each line of your code express as much of your intent as possible. That way, when you have a huge codebase, there are all these useful little "hooks" throughout it that will be useful when you need to refactor.
Your codebase will "say" more, and therefore your tools will "know" more, and therefore your tools can do more for you.
Oh, but one more thing. It's very nice if the abstractions you use to do this are resolved down to nothing at runtime. So there's no penalty at runtime for the precision you added to the code. Let the compiler do the hard work so that you can say "Position", but at runtime, the code is as if you had written "Float".
I'm actually surprised that people learn about this only now... It is quite standard (and very common) thing to do in Ada.
But then we can "subtype Strength is Natural range Natural'First .. 32;" since 80s (numeric type with all operators defined with permitted range of values from 0 to 32; range checks are performed at runtime by compiler - but most of them are optimized out anyway so penalty for them is in single digit percents of performance). Or if you prefer example from that site: "type First_Name is new String;"
While I agree with the idea of the article, what's the logic for not using "small types" for all parameters? Or, to put it another way, how do you decide a given method should have "small types" used for parameters?
Also, although it makes the code much more verbose, named parameters (as used in, for example, Objective-C) fix any eventual bug with refactoring methods' signatures, right?
> Or, to put it another way, how do you decide a given method should have "small types" used for parameters?
I think a good rule of thumb is to ask yourself what the "dimension" or "unit" the parameter has. (See also: Dimensional Analysis.) You never want to pass 4.33f radians into a function expecting 4.33f newtons!
A few more examples:
Distance in meters, Distance in feet, Speed in m/s, speed in yards/minute, Acceleration in m/s^2, Acceleration in cm/s^2, Mass in grams, Weight in pounds, Force in newtons, Force in dynes, Temperature in Celsius, Temperature in Fahrenheit, Angles in radians, Angles in degrees....
"But I don't do physics programming!", you say? Well, there's plenty more:
Time in seconds, Time in milliseconds, Distance in pixels, Distance in inches, Numeric ID of a Foo, Numeric ID of a Bar, Bytes in UTF8, Bytes in ASCII, US currency in dollars, US currency in cents, Euros, interest rate (yearly), interest rate (monthly)...
And that's not even getting into industry-specific dimensions that might exist.
That brings up an interesting issue -- is the "right way" to do that to have different types for these, or one type for distance with different factory function/methods for different units. E.g.: should "meters" and "feet" be different types, or should the "measurement" module have both a "meters" and "feet" method that return the same ("distance") type.
It seems to me the latter is more conceptually clean.
OTOH, math might be easier to express with the former (particularly in a language that allows you to define implicit type conversions, so that if you pass an "inch" value to a function expecting a "cm" parameter, it gets automatically converted to the appropriate "cm" value.)
Well, there are really three different kinds of information involved. For example, the variable "target_resist" might have: Computational type (float), Dimension (resistance), and Unit (milliohms).
Most built-in type-systems (sensibly!) only try to tackle the broadly-applicable computational-type, since the rest is context-specific.
However, in a contractual sense, all of those are important, if any are not as expected, you'll get bugs.
Ok, I can understand that logic but then you might end up with dozens and dozens of "small types" so, as far as I understand, those who try to follow that paradigm have no problem whatsoever with the potentially high number of these types, is it so?
(Again, asked another way, is it ever a problem to create a "small type" if it follows that rule? For some codebases it might mean having a "small type" per unique parameter on signatures.)
> Again, asked another way, is it ever a problem to create a "small type" if it follows that rule?
IMO the biggest problem isn't having too many distinct types, because they represent information you'd need to track manually anyway, like knowing that $time is seconds rather than milliseconds. I'd be more worried about:
1. The risk of conflicts between different groups of people who both defined their own type called called "Newtons" and want their libraries to work together.
2. The sliding-scale of inconvenience if your implementation gets in the way of the mathematics, like having to call `foo.asFloat() * 2` rather than `foo * 2`. (You can relax things a bit by relying on best-effort annotations.)
The example given in this post doesn't show this, but what using types allows us to do is understand what a specific value is throughout its whole lifetime through all sorts of functions. We know that this vector is a velocity, and our compiler knows that too, so we can't unwittingly treat it as an acceleration vector further on down the line, whether that's done when passing it to another function, declaring variables, adding things together, or what-have-you. You are right that the specific example shown in the article can be achieved with simple named parameters, but that's a very small and specific application of the very broad protection that proper type use gives you.
According to Curry-Howard correspondence, types are conclusions and programs are proofs of the conclusion. Any program is corresponded to a deduction procedure, such as a Hilbert-style proof or natural deduction proof.
I realize that variable naming and type systems are two different things, but it seems many programmers never realized the point of type systems (expressing the type for the programmers sake) because they only ever saw it used to distinguish abstract primitives. For a long time, I had trouble understanding that hungarian notation wasn't a type system, because they seemed to do precisely the same thing - that's how limited my understanding of types was. The kind of style seen in this article was alien to me for a long time, but it was enlightening to realize that the type system is there to help me out, not just make me type a bunch of unnecessary crap.
tl;dr: I really need to learn Haskell.