This is really interesting. A hypothesis I have on Ruby is that people attribute dynamic typing to it being a productive language, but that Ruby is actually productive for other reasons, in spite of being dynamically typed.
With Crystal, at least when it matures a bit more, this hypothesis could be tested.
There are very logical reasons why dynamic typing at first appears better than static for Rubyists, that I think don't hold up as well after you scratch the surface:
* many Rubyists came from Java, and that kind of typing does slow you down. You need a modern type system with at least local type inference (Crystal seems to have global type inference)
* dynamic typing does actually help develop things more quickly in some cases, definitely in the case of small code bases for newer developers. A developer only has to think about runtime. With static typing a developer also must think about compile-time types, which takes time to master and integrate into their development. The relative payoff of preventing bugs grows exponentially as the complexity of the code base increases and at least linearly with size of the code base.
There is a Ruby with static typing---it's called C#. Seriously, even though my background is all in non-MS stacks, I did some C# work a few years ago and found the language surprisingly nice, and with a lot of the features I love from Ruby. It even has type inference like Crystal. (The libraries are not nearly as clean/consistent as Java though.) For folks in the Ruby/Python/Java world, it's something to look at.
As someone whose last 8 years of professional work have been largely split between...C# and Ruby, I have to disagree. I do agree about C# being quite nice; it's all the things Java should be but isn't. And it's evolving quickly in all the right ways.
But it isn't really like Ruby with static typing. The language isn't Rubyish (no mixins or blocks, for example). It makes you put everything in a class, a la Java. It can verbose and is occasionally downright clunky (though syntactically it's categorically slicker than Java). The .NET ecosystem doesn't have the Ruby characteristic of lots of small, fast-evolving libraries that are easy to use. In fact, the C# open source ecosystem is kinda poor in general and not a huge part of most developer's lives, whereas Ruby's ecosystem is vibrant and an integral part of its coding culture.
Another way to put all that is that if C# were purely dynamically typed, it wouldn't feel anything like Ruby.
I do see what you're saying: LINQ feels like a static (and lazy!) version of Ruby's Enumerable module, the lambdas look similar, C# actually does have optional dynamic typing, and C# is increasingly full of nice developer-friendly features. In general, I'm a fan. But switching between them doesn't feel like just a static/dynamic change.
Yeah, I mostly of agree with that and sort of said so in my post. Not sure it has much impact on my central point though.
One really key difference with LINQ is that it doesn't produce arrays (or dictionaries, as in your example); it produces Enumerators, which you then have to do call toList() or toDictionary() on. That laziness is actually an awesome feature and my favorite thing about LINQ, because it can massively improve performance by shortcutting work and not creating intermediate arrays. You can even work on infinite sequences with it. Besides performance, it's just tastier. It's so great I actually wrote a Ruby library to imitate it: https://github.com/icambron/lazer
Is LINQ really fast/performant though? Wouldn't the above expression cause three sequential loops to run?
One of the biggest performance issues I've seen with modern .NET code is people abusing LINQ and lambdas. Chaining functions like this is most decidedly not fast. I once wrote a library that had do do some heavy signal processing on large data sets, and since I wanted to ship the first version as soon as possible, I just used LINQ in a lot of functions to save time. It wasn't very performant so later I rewrote most of the functions to use standard native code such as loops for iteration, hashmaps for caching and all sorts of improvements like that. I completely got rid of LINQ in that version and for many functions the runtime went down from something like 500ms-1000ms to microsecond area.
So sure, LINQ makes development fast and it's very nice to be able to write code such as .Skip(10).Take(50).Where(x => ...). On most web projects, it won't make a huge difference. I've seen Rails "developers" use ActiveRecord in such a way that they would create double and triple nested loops and then hit the database multiple times by using enumerable functions on ActiveRecord objects without realizing how this works, what's going on behind the curtains and so on. I've seen .NET devs do similar things using EntityFramework.
So yeah, it's convenient and all, but it can also be very dangerous when used by someone who doesn't understand the fundamentals behind these principles.
> Wouldn't the above expression cause three sequential loops to run?
No, it wouldn't; that's the really important point about LINQ I was, clumsily, trying to express above [1]. Take this admittedly totally contrived example:
someList
.Where(i => i % 2 == 0)
.Select(i => i + 7)
.Take(5)
This is not equivalent to a bunch of sequential loops. What it is is a bunch nested Enumerators. Here's how it works. It gets the list's Enumerator, which is an interface that has a MoveNext() method and a Current property. In this case, MoveNext() just retrieves the next element of the list. Then Where() call wraps that enumerator with another enumerator [2], but this time its implementation of MoveNext() calls the wrapped MovedNext() until it finds a number divisible by 2, and then sets its Current property to that. That enumerator is wrapped with one whose MoveNext() calls underlying.MoveNext() and sets Current to underlying.Current + 7. Take just sets Current to null after 5 underlying MoveNext() calls.
So all that returns an enumerable, so as written above, it actually hasn't done any real work yet. It's just wrapped some stuff in some other stuff.
Once we walk the enumerable--either by putting a foreach around it or by calling ToList() on it--we start processing list elements. But they come through one at a time as these MoveNext() calls bring through the list items; think of them as working from the inside out, with each MoveNext() call asking for one item, however that layer of the onion has defined "one item". The item is pulled up through the chain, only "leaving" the original list when it's needed. The entire list is traversed at most once, and in our example, possibly far less: the Take(5) stops calling MoveNext() after it's received 5 values, so we stop processing the list after that happens. If someList were the list of natural numbers, we'd only read the first 10 values from the list.
Now, those nested Enumerator calls aren't completely free, but they're not bad either, and you certainly shouldn't be seeing a one second vs microseconds difference. If you craft the chain correctly, it's functionally equivalent to having all of the right short circuitry in the manual for-loop version, and obviously it's way nicer.
So why are you seeing such poor perf on your LINQ chains? Hard to say without looking at them, but a few of pointers are: (1) Never call ToList() or ToDictionary() until the end of your chain. Or anything else that would prematurely "eat" the enumerable. (2) Order the chain so that filters that eliminate the most items go at the end of the chain, similar to how you'd put their equivalent if (...) continue; checks at the beginning of your loop body. (3) Just be cognizant of how LINQ chains actually work.
[1] In the example in the parent, FindAll isn't actually a LINQ method, so there is one extra loop in there. Always use Where() if you're chaining; use FindAll() when you want a simple List -> List transformation.
[2] A detail elided here: each level actually returns an Enumerable and the layer wrapping it does a GetEnumerator() call on that.
Thank you for this awesome explanation, it really clarifies how Enumerable methods work. I guess this is the real reason behind deferred execution. Still, I can't help thinking there is a big overhead involved with using such methods. We did some benchmarks in the past and the code that does the same thing manually always ended up being much faster.
The nice thing about Enumerable methods is that they can significantly speed up development and most projects won't suffer for it. However, for speed critical code it's probably not the best tool in the box.
I have a very similar background/opinion. Almost all of my professional work has been on the *nix stack and I generally like Ruby. During a brief stint in Microsoft-land I found myself super impressed with C#. First class functions, type inference, anonymous types/functions and LINQ make for a really nice general purpose language. Its really a shame its mired in the Microsoft toolchain.
> (The libraries are not nearly as clean/consistent as Java though.)
I've had the complete opposite experience. I've found when it comes to libraries, C# has fewer, but higher quality than Java ones. I've also found C# to have a much more intuitive standard library. In C#, I can often just figure out how a standard library class works purely through the type system and the IDE, while in Java, I'd have to search through documentation more frequently.
> A hypothesis I have on Ruby is that people attribute dynamic typing to it being a productive language, but that Ruby is actually productive for other reasons, in spite of being dynamically typed.
That has always been my reaction to most comments about dynamically typed scripting languages, including Python and Ruby. Most of the time, turning compile-time type errors into runtime exceptions is not a feature.
Yes, but the upside is you avoid the complexity headaches of interfaces and covariance/contravariance and generics and all that stuff.
Heck, look at all that FactoryFactoryFactory stuff you have to deal with when you want to swap out core parts of a framework - you end up with config files and XML and you have to make sure the guy who made the original framework designed it to allow you to change the part you want to change with your modular swap... in a dynamic language? Monkey patch. It's ugly, but it works.
Heck, look at serialization. If you want to serialize/deserialize static objects, you need metadata that includes the types of everything - stuff like XSD in XML. Dynamic languages don't need that stuff, which is part of the modern popularity of JSON... Javascript and its buddies just play nicer with JSON. I actually wish there was a popular simplified analogue to XSD for Json because I actually miss the ease of serializing into objects that you get using XML/XSD in C# or JSON in Javascript.
The dynamic-ish nature of exceptions that seem like an unholy abominatable hole in the type-system in static languages (or a source of unending-agony in Java's checked exceptions) suddenly fit nicely into a dynamic-typed language paradigm. Python embraces a "easier to ask forgiveness than permission" approach, throwing exceptions willy nilly and it makes nice clean code.
Plus, working in a dynamically typed language heavily discourages premature optimization because you already threw performance out the window.
But yeah, you're basically working without a net, and that kinda sucks.
> Yes, but the upside is you avoid the complexity headaches of interfaces and covariance/contravariance and generics and all that stuff.
Maybe it's a matter of what you are accustomed to but not having explicit declared types gives me headaches. I find code without types horribly unreadable - because I can't see at the first glance what data a function processes, etc.
Also dynamic typing and its runtime type checking gives me that uncanny feeling of "something might be wrong but I won't find out till I hit it".
You can easily do that in a statically typed language as well; you just have to do it explicitly. For instance, Haskell has the error function, of type "String -> a"; you can use it absolutely anywhere, no matter the expected type, give it an error message, and if the value it produces is ever actually used, it throws an exception with the specified message.
Don't discount the value of doing it implicitly. When you're just throwing stuff together (and there is a place for that), it's a distraction to work around parts of the system that won't work yet, both the typing and the mental overhead.
Having done this both with Clojure and Erlang, I find the unversioned, dynamic mode of achieving this is fraught with trouble. It's really easy to end up in complex atypical states during upgrade which can at best be hard to debug and worst corrupt other, longer term state.
Exactly. I think what a lot of people miss is that erlang makes live upgrades possible but certainly not easy. Trying to upgrade a complex application one component at a time (the way erlang/otp promotes) is still insanely hard and not worth the pain unless you're working in a domain where you have absolutely no choice (which is of course why Ericsson developed erlang in the first place). For the vast majority of applications the right choice is to restart the server and accept a few seconds downtime.
I think with better documentation I'd actually call Erlang's model "easy" in that its really easy to do things correctly with absolutely minimum chance of corruption/failure/non-repeatability. OTP gives you the tooling—releases, versions thereof, and simultaneous multiple module tenancy being key—to actually do a good job. That's a rare thing.
I started out writing disagreements to your points, somehow having misread them as being arguments you support for Rubists preferring dynamic typing, but then during editing I re-read that you think it's productive in spite of dynamic types. I agree totally.
I don't think Java-style typing is that much of a hindrance. It's irritating boilerplate, but people using those languages can slam it out very quickly.
I don't think reasoning about runtime types is any more difficult than reasoning about compile-time times, it's in fact a higher cognitive load because you cannot ignore it and rely on a type-checking phase that covers all your paths without explicit test cases.
I personally found Ruby to be productive[1] due to the expressive metaprogramming, how easy it is to make DSLs, blocks and yield for CPS, generators, and co-routines, and how everything is re-definable. I don't know how much dynamic typing factors into that, but I think if you could get the same things with equally expressive syntax, Rubists would still like it.
[1] It used to be my favorite. I still like it (and love it for scripting), but prefer GADTs and pattern-matching on type constructors now.
User-defined static types are a theory of a solution. But mostly we don't know what the hell we are doing, so we don't have a correct theory. And this is right, because instead of perfecting our theory, we should be adding the next new feature or new product.
Exceptions include frequently reused code (libraries, components, frameworks) and well-specified problems (rewriting a known problem, implementing an algorithm, shuttle-like high-risk projects). Here, static types are also useful as documentation.
As Brooks said: plan to throw one away; you will, anyhow. i.e. code to understand, then to solve. You don't understand it well enough to have a correct theory the first time, and it's less feasible to rewrite a project from scratch the larger it is.
I was very excited by this when I read the description, because a compiled language that looks like Ruby is exactly what I've wanted. Unfortunately I'm not super excited by the quirks of the implementation. For example:
if some_condition
a = 1
else
a = 1.5
end
If I'm working in a compiled and typed language, the last thing I want is a language that automatically gives union types to variables. As far as I'm concerned, the type inference should fail at this point. In the above example, now I'm forcing the compiler to maintain a union which is going to have a pretty significant overhead on every computation the variable `a` is involved in.
But isn't that how type inference works in most statically typed languages?
For example, in Scala:
val x = if (some_condition) Employer else Employee
If Employer and Employee both derive from Person, then x will be of type Person. The run time uses dynamic dispatch to figure out how members are accessed from x.
If you want to constrain x, you need to specify the type explicitly.
Your example shows a different kind of behavior than the parent's example. A union type (t1 \/ t2) is a type that says "either this value has type t1, or it has type t2".
In your example, the type of x is not (Employer \/ Employee), it is their shared superclass - Person. The analogous example would be if
val x = if (some_condition) Employer else Employee
succeeded even though Employer and Employee did not share a superclass. Very few languages use union types - Typed Racket comes to mind, and Algol apparently did too.
But actually in Crystal you will have the same behaviour:
class Person
end
class Employer < Person
end
class Employee < Person
end
x = some_condition ? Employer.new : Employee.new
# x is a Person+
This is not said in the "happy birthday" article (or anywhere else, IIRC).
In the beginning we typed x as Employer | Employee. But, as the hierarchy grew bigger compile times became huge. Then we decided to let x be the lowest superclass of all the types in the union (and mark it with a "+", meaning: it's this class, or any subclass). This made compile times much faster, and in most (if not all) cases this is what you want when you assign different types under the same hierarchy to a variable.
What this does mean, though, is that the following won't compile:
# Yes, there are abstract classes in Crystal
abstract class Animal
end
class Dog < Animal
def talk
end
end
class Cat < Animal
def talk
end
end
class Mouse < Animal
end
x = foo ? Dog.new : Cat.new
x.talk # undefined method 'talk' for Mouse
That is, even though "x" is never assigned a Mouse, Crystal infers the type of "x" to be Animal+, so it really doesn't know which types are in and considers all cases.
Again, this is most of the time something good: if you introduce a new class in your hierarchy you probably want it to respond to some same methods as the other classes in the hierarchy.
Well, the happy birthday article has a section on "union types", and that code block has the comment "# Here a can be an Int32 or Float64". I just assumed that this meant a had the type Int32 | Float64. If the language doesn't actually have union types, then the article should probably be edited to reflect that (because it's very misleading on this issue).
It has union types. Right now if you do 1 || 1.5 it gives you Int32 | Float64. If you do Foo.new || Bar.new, and Foo and Bar are not related (except they both inherit from Reference), then you get Foo | Bar. If Foo and Bar are related by some superclass Super, then you get Super+.
If you do:
a = [1, 'a', 1.5, "hello"]
you get Array(Int32 | Char | Float64 | String)
In a way, the Super+ type is a union type of all the subtypes of Super, including itself, but just with a shorter name.
If your variable gets promoted to a union type, and all of the types respond to the methods you give them, then it will compile and run successfully (duck typing). Yes, it will have a small performance cost. If you then profile your app and find that the performance problem is that one, you go and fix it.
You can always have a static analyzer tool (that works, because Crystal is compiled) that can pin-point all the locations of union types. You can then put some type restrictions wherever you need them, to know where the union types come from.
The idea is that you can start prototyping something that works and is quite fast, and later you can always improve the performance without ever having to write C code.
Also, the union of an int and float will probably be just float so it won't have any performance overhead.
The one that caught my eye is that the macros are disappointing: they merely feel like (not even glorified) syntactic sugar for an eval wrapped in a function.
Hint: if I'm manipulating/interpolating a string, it's an eval, not a (lisp-y) macro.
I guess it's also worth mentioning Mirah, a JVM language heavily inspired by Ruby. Unlike JRuby, it compiles down to byte code ahead of time and needs not support libraries.
I'm putting my faith in Ruby. It might take 10 years, but eventually the performance will resemble C's. It's basically a compile-to-C language right now as it is. There's just a whole bunch of inefficiencies in the implementation. Once they get ironed out, we'll finally be able to have our cake and eat it too. One language to rule them all.
"Just a whole bunch of inefficiencies in the implementation."
That's a common misconception: what makes a language "faster" than another is not only more or less optimizations in the compiler or efficiencies in the runtime. There are language features that just kill performance. The typical culprits, in descending order of cost:
To write fast code, you have to use the right data structures (impossible if everything is a hash map), manage memory (possible even with garbage collection), and be able to inline everything in the inner loops (introspection,eval,etc make this hard).
In case you haven't seen this Charles Nutter post from a few months back, you may find it interesting. The TL;DR is that dynamic features in languages come at a cost (but he says "prove me wrong").
Well, the compile-to-C thing is one thing. It's possible to compile any language down to C. You could compile Brainfuck or Python down to C if you wanted. The question is how complicated the resultant code will end up being compared to pure C. I would only call Ruby a "compile-to-C" language if it could generate code that is at least somewhat comparable to sane C code. Here's some C code that sums all the integers in an array:
long sum(int *a, int len)
{
long ret = 0;
for (int i = 0; i < len; i++)
ret += a[i];
return ret;
}
The entire loop (the conditional test, the increment, and the `ret` update) could probably be implemented in less than 10 native instructions depending on your machine. Faaaaast. If Ruby were a compile-to-C language, I would expect it to produce C code that looked somewhat like this. So let's look at the same snippet in Ruby:
# sum the first len elements of a
def sum(a, len)
ret = i = 0
while i < len
ret += a[i]
i += 1
end
ret
end
(This is far from being idiomatic Ruby code, but this solution is the simplest and it also seems like it would be the easiest to directly translate to C.) Semantically, here's what that would translate to (in a C-like pseudocode):
Why is this so complicated? Because I've captured Ruby's dynamic typing and dynamic dispatch within the function itself. `ret` isn't a long, it's a Ruby variable that can hold any type of object, so we need to capture that in the source. Same with `i`, `a`, and `len`. When we say `a[i]`, we're not jumping to the `i`th element of the integer array `a`, which would be super fast. Instead, we have to dynamically dispatch the `[]` method, which will perform bounds checking and a bunch of type-checking. We also have to dynamically dispatch the `<` and `+` methods everywhere, which perform type-checking themselves. Obviously, this all takes much, much more than 10 native instructions. You can't generally optimize out the method dispatches, since you are generally allowed in Ruby to redefine methods of built-in classes wherever you want. You might be able to perform some static analysis to get rid of some dynamic types, but you have to be careful with machine integers, since they overflow without warning. You'd have to check after every operation you do that the operation didn't overflow, and switch it out for a big integer if that happens. Any of these methods could raise exceptions and that's a nontrivial problem to deal with. The garbage collector is also running in the background.
And this is just a simple example, too. Things get a hell of a lot more complicated when you introduce blocks and dynamic scoping (which I purposefully stayed away from). So that should paint a somewhat clear picture of why it's not just an issue of waiting 10 years until Ruby gets as fast as C. I don't know how close it's even possible to get without messing with the semantics of the language.
Well, Rubinius is quite faster than MRI Ruby (still not as fast as C). Although, I don't like your long prognosis, I love the attitude!
However, I have never seen the problem in writing C extensions for Ruby when I am pressed for speed. Only complaint that I can think of is that my first extension was "problematic" in terms of not being able to find any resources, which made me resort to reading the source, which in turn made me a better Rubyist!
I think what's ultimately going to be needed is better profiling tools and an easy-to-use extension DSL that's designed to hook into Ruby. That way we could experiment with different structures than hash maps and still keep good Ruby style.
No, Ruby has too many difficulties. LuaJIT is closest to C performance right now, there are innumerable discussions about why Ruby/Python are much harder/impossible to do this for.
Dynamic typing is simply sugar. It's a bit misleading (particularly for newer developers) that you don't have to think about types just because you don't have to declare them. Automatic coercion, etc, is really only useful if you're a more experienced developer because for the most part, you keep the type information in your head.
That said, python is the most productive language I've used so far but type management is really just one part of that.
This is great. Now if we need parallelism we can choose Rubinius and if we require raw speed we can choose Crystal (when it's done) instead of jumping on the JRuby bandwagon.
I respect Charles Oliver Nutter but Java is something I want less of in my life. This seems like a great alternative for people seeking performant Ruby interpreters.
It doesn't attempt to be different from Ruby at least in its looks and feel. Crystal strives to be as similar to Ruby as possible. The creator stated some reasons why they created Crystal on the github repo.
Which also implies performance benefits over Ruby. I think another way to view it is Go with Ruby syntax (although Go has its own runtime overhead). Their introduction document [1] is a pretty concise overview.
I think it's an interesting project. But not being a Ruby focused coder these days, I can't see myself choosing this over other compiled languages at this point.
With Crystal, at least when it matures a bit more, this hypothesis could be tested.
There are very logical reasons why dynamic typing at first appears better than static for Rubyists, that I think don't hold up as well after you scratch the surface:
* many Rubyists came from Java, and that kind of typing does slow you down. You need a modern type system with at least local type inference (Crystal seems to have global type inference)
* dynamic typing does actually help develop things more quickly in some cases, definitely in the case of small code bases for newer developers. A developer only has to think about runtime. With static typing a developer also must think about compile-time types, which takes time to master and integrate into their development. The relative payoff of preventing bugs grows exponentially as the complexity of the code base increases and at least linearly with size of the code base.