Is Functional Abstraction Too Clever?

mechanical_fish · on Oct 17, 2009

My verdict: The code is just fine. It's the syntax of the extension methods that is causing the audience to panic and flee:

    public static IEnumerable<TResult> Pairwise<TSource, TResult>(
        this IEnumerable<TSource> source,
        Func<TSource, TSource, TResult> resultSelector)

"Too clever" is, if anything, a polite way for a relative beginner to describe this code. I'd go with "fscking illegible" myself.

When, as a language designer, you find yourself designing syntax like this, it's a sign that you have drunk the Kool-Aid too greedily and too deep. In exchange for precious, precious type safety you have traded away the legibility and learnability of your language. The inevitable result is that, five years from now, you will be one of those frustrated people who grumbles on message boards, wondering why everyone prefers to use PHP.

Try a different language. I suspect that in Ruby or Python this code would not be so "clever". In Scheme it would be par for the course.

colomon · on Oct 17, 2009

As an example, here's my quick implementation of the original code in Perl 6. (Note that this is the entire implementation, no helper functions needed.)

    sub GetSlots($slots, $max) 
    { 
        my @a = (1..($max-1)).pick($slots-1, :repl).sort;
        return (@a, $max) >>-<< (0, @a);
     }

(1..($max-1)) generates a list of integers from 1 to $max - 1. pick($slots-1, :repl) randomly picks $slots - 1 from the first list, replacing the one picked each time (so duplicates are possible). sort does just what you would think.

The second line of the sub uses a hyper-operator to subtract one array from another -- in this case the sorted array of random numbers plus $max on the left, 0 plus the array on the right. The result is also an array, and that's what we return.

It's not a pure functional implementation, obviously -- you could certainly do that in Perl 6, but to my mind this approach emphasizes what is actually going on, rather than hiding it in the Pairwise function. (Without digging into the Pairwise function code, how do you know if it splits (1,2,3,4) into (1,2) and (3,4) or (1,2), (2,3), and (3,4)?)

BTW, this code already works in the Rakudo implementation of Perl 6...

jacquesm · on Oct 17, 2009

Type safety is vastly overrated.

Dealing with polymorphism in a clever way (say by using duck-typing) is the way to get much the same effect, the only price you pay is that the errors shift from compile time to run time, and that's unacceptable in some situations.

The mess above is a typical example of taking something to an extreme.

I remember there was somewhere a coding standard (I think it was microsoft) that required you to put the initial letter of the type of every parameter in the function names.

People come up with something ('typing' or 'everything is an object') and then try to shoehorn each and every issue they have into that shape, even if it is totally not applicable.

viraptor · on Oct 17, 2009

> Type safety is vastly overrated.

I don't think it is. Explicitly listing types is definitely overrated though. Basically the same function in haskell:

    pairwise (a:b:xs) f = (f a b):(pairwise (b:xs) f)
    pairwise _ _ = []

What does ghci know about it now?

    pairwise :: [t] -> (t -> t -> a) -> [a]

Without any user annotation - magic! :) Just in case - mapping to the original types from the article:

    TResult == a
    IEnumerable<TResult> == [a]
    TSource == t
    Func<TSource, TSource, TResult> == t -> t -> a

Shorter notation and a bit of type inference can make that example a lot easier. And it is completely type-safe which can save you in case the predicate tries to do something stupid.

jacquesm · on Oct 17, 2009

I don't know anything about Haskell than a bit of its history and main philosophy, but if I understood that example right then basically you say how things can combine and if you combine them in 'wrong' ways it gets caught without having to name every possible instance of such combinations ?

Or did I misunderstand you ?

viraptor · on Oct 17, 2009

I'm not sure this answer is what you're looking for, but:

The function does the same thing as the Pairwise in the article. What haskell does is "guess" at what the types of each variable / function can be, which is fairly easy in this case. For example pairwise expects a list in the first argument (because it's split into "a","b" and the rest of items), and some other argument "f". "f" is called with "a" and "b" as arguments (which are of the same type, because they come from the same list) and returns some result. The return value of "pairwise" is constructed from either results of "f" or an empty list, so the result must be a list of the items that "f" returns. etc.

So the compiler itself can figure out all the types needed in that function without any help from the user. (basically we could do it in the C# case too, even if someone removed all the types) Haskell will happily compile that function with the signature it found out and will let you run "pairwise [1,2,3,4] (\x y -> x-y)". But it will not even compile your program if you try to use a number as the first parameter or a function with different signature in the second. It's just a function with generic arguments and some limits on what you can use ("f" has to take both arguments of the same type - you can't use "Char -> Int -> a" there).

You're still allowed to run "pairwise "abc" (\x y -> x:[y])" and will get "ab" "bc" - however there's no dynamic duck typing involved.

So my point was that there's nothing bad about the static typing itself. In most cases you don't need to write all the types yourself - they can be easily guessed and it's much better if the compiler does it for you.

jacquesm · on Oct 17, 2009

Right, that's what type inference is all about. Impressive they can take it that far without having any 'input' from the programmer, that's something that should make its way into other languages.

Thanks for the detailed explanation.

jerf · on Oct 17, 2009

Hindley–Milner type inference is a mathematically-sound type inference system. Given certain not-entirely-unreasonable constraints, it is always correct. Given a program written entirely in those constraints, HM will be fully correct, no matter how large you make it.

Unfortunately, the constraints are indeed a little limiting. Haskell offers many advanced features that technically break HM type inference. Thus, in Haskell it is considered good practice to always label your functions with their type signatures. (Human-made ones are often better; the compiler may have been more generic than you actually want (though still correct), or you may be able to better "spell" the type signature using your type synonyms that mean something to humans.) Then, even if there is a function that manages to confuse the type inferencer, the damage is contained.

It's fairly easy to do the type annotation, too; nobody has to know that what you did was leave it off, load it into the compiler, and ask the interpreter what the inferred type was. :) I'm sure Haskell experts don't ever have to do that, but I'm still learning and there's definitely been some times where I could write a function that worked as I expected but I couldn't quite follow the type. So far I'm not confusing the type inferencer very often; mostly it just manifests as types I'd "spell" differently but are equivalent.

blasdel · on Oct 17, 2009

Thus, in Haskell it is considered good practice to always label your functions with their type signatures.

Haskell is 20 years old, and this has been made 'good practice' only very recently -- I'd date it to dons's revival blitz a few years ago. A professor of mine who'd been using Haskell from the beginning was annoyed by their use.

blasdel · on Oct 17, 2009

that's something that should make its way into other languages.

How do you think javac knows when you get the types wrong? It's inferring types internally, it just makes you repeat yourself out of fealty to Bondage and Discipline.

ricree · on Oct 17, 2009

"that's something that should make its way into other languages"

It's starting to, to some extent. C#, for example, has had inferred types available since 3.0, though it still requires using the "var" keyword when declaring variables.

mechanical_fish · on Oct 17, 2009

The convention you refer to is "Hungarian Notation", and Spolsky's essay on the subject is required reading before you get too vehement about it's stupidity. Basically, it started out as a relatively useful convention, then got misunderstood and misapplied and morphed into a ridiculous version of itself.

I wonder sometimes if the same thing has happened to type safety. I'm not convinced it is a bad idea, actually. I am convinced that it's not well-served by having Java as it's poster child. If my mental picture of a type-safe language is something that reads like this, no wonder I'm inclined to think it's overrated.

jacquesm · on Oct 17, 2009

The ideas behind hungarian are sound, but one of the reasons the name 'hungarian' stuck was that the names were unpronounceable.

Modern IDEs make that whole thing moot anyway, but the notation is still being used afaik.

There is nothing wrong with type-safety per se, it's just that when it gets to the point where it is literally 'in the way of progress' that things have been taken too far.

Ideally its something that's there when you need it, and override-able when you specifically don't need it.

That way you can do your type checking at compile time for the 70% or so of the code where that makes good sense, and where you're happy to have those guardrails to save you some time catching stupid bugs.

The other 30% of your code you can be as free as you want to be knowing full well that the price of failure would be a very angry end user.

That's because I think there are multiple 'modes' of programming, if you're writing a one-off data import script then type safety is just a real pain, as soon as you want to do that every day at a few thousand locations you'd better have it bullet proof.

Now you need two languages for those two situations.

mdemare · on Oct 17, 2009

This isn't Java, this is C#. (In Java this would be more verbose).

mechanical_fish · on Oct 17, 2009

I'm aware of that. I chose to pick on Java because I assume, perhaps too generously, that the C# designers are constrained by the need to market their language to Java developers, such that when Java heads over a cliff the C# folks are obliged to follow, though perhaps with a more elegant series of aerial stunts on the way down.

The risk is that I unfairly maligned Java, a language which I do not know well anymore, since I abandoned it about when it started to look like this. But according to you I'm being too generous to them as well. :)

imbaczek · on Oct 17, 2009

>the C# designers are constrained by the need to market their language to Java developers

that was true only up to about C# 2.0; Java is way, way behind nowadays.

coliveira · on Oct 17, 2009

> C# designers are constrained by the need to market their language to Java developers

No, because C# had generics before Java.

weavejester · on Oct 17, 2009

Well, the C# 2.0 specification was written in 2002, but the compiler was only released in 2005, a year after Sun released J2SE 5.0. So from a implementation perspective, Java had generics before C#.

mechanical_fish · on Oct 17, 2009

Oops. My ire is misplaced, then.

mdemare · on Oct 17, 2009

I'm not sure if C#/Java's type-safety has morphed into a ridiculous version of itself. I suspect that there has been too little, rather than too much development in this regard. Type-inference and typedefs would go a long way of solving the excesses of type-safety, certainly in this case.

jrockway · on Oct 17, 2009

Java has no type safety, since every object can be null, and methods can't be called on null. This means your program can blow up at runtime due to type errors.

fnid · on Oct 17, 2009

You should qualify, "Type safety is vastly overrated -- today." There was a time when type safety was very valuable to the performance and functionality of programming. Computers are so fast now that the performance hit is almost negligible, but this wasn't always the case and dynamic languages were very very slow compared to their strictly typed brethren.

Even within the same language, strict typing and dynamic typing had big gaps in performance.

blasdel · on Oct 17, 2009

Please read http://www.pphsg.org/cdsmith/types.html -- None of [safety, strict, dynamic, typing] mean what you think they mean.

Your most fundamental error is that 'dynamic' and 'strict' are not mutually exclusive in the slightest. In fact, almost all dynamically type-checked language implementations are vastly more strict (and safe) than any of the non-HM static type checking implementations.

Confusion · on Oct 17, 2009

Type safety does not require illegible code. You could express this more nicely in, even, Scala.

jauco · on Oct 17, 2009

I really value your opinion on most programming matters but I have to respectfully disagree. I can understand someone having a problem with c#'s syntax in general. But in this case I had more trouble with verifying the algorithm than with the type annotations. They might require some carefull reading but you have to keep track of only two types so the cognitive load is minimal. More importantly, writing these function definitions makes using the functions really easy (auto complete support etc.)

that being said I don't really like the tone of the code's author who poses the question on his blog with fake honesty hoping everyone will laurel his clever solution.

swombat · on Oct 17, 2009

Functional abstraction in general is not too clever. This particular code example, though, absolutely is. As mechanical_fish pointed out, it's pretty much illegible.

DRY is only one of the principles of good code, and not the highest one. The highest one would be that the code must work. The second highest that the code must be clear to the reader.

I warmly recommend this excellent talk by Marcel Molina Jr at RubyConf 2007, about what makes code beautiful:

http://rubyconf2007.confreaks.com/d1t1p1_what_makes_code_bea...

He gives a good example of a case where using too many abstractions results in unclear, and hence ugly code.

weavejester · on Oct 17, 2009

Huh? Why is it illegible?

jrockway · on Oct 17, 2009

Because the reader doesn't know the language it's written in. Note that "illegible" is a property of the reader, in this case, not of the code itself.

fnid · on Oct 17, 2009

Too clever is not determined by some property of the code, but rather the audience who reads it.