2012: The year Rubyists learned to stop worrying and love threads

tinco · on Dec 19, 2012

Ruby's big problem with concurrency is the mutability of everything. Ruby just loves mutable state, it is in its blood. Not even the constants are really constant, not even the classes make any promise about the future.

Embracing concurrency would mean to compromise there, in your code you have to acknowledge that there are variables you could reference, but you shouldn't, because they aren't threadsafe. This goes against the idea that ruby is this beautiful abstract garden where everything is possible.

This deep_dup and deep_freeze make it easy for the programmer to create safe objects, but they don't make it harder to use unsafe objects. I think this is why they haven't been accepted in Ruby yet, and perhaps will not be, they just solve a problem that Ruby does not want to go into, for the same reason Ruby won't have a memory model that takes concurrency into account.

In my opinion, the only way Ruby should only ever integrate threads into the language is by introducing a way to start a second thread that will execute either a string or a file. It could return an object that allows sending messages to this spawned thread. The message send method itself might itself perform deep_dup or deep_freeze on the objects it receives. (without needing to expose this deep_dup/deep_freeze method)

You might complain that evalling a string, or loading a file seems like an evil way of going about things, but this is the only way to introducing code into ruby that does not close over its scope.

An alternative to evalling would be to introduce non-closure blocks, but I think their existance might break the principle of least surprise.

edit: btw this idea of spawning a second thread that returns an object that can be used to send objects to another thread could already be implemented by using ruby's fork method and a handle to some shared memory or a pipe.

edit: is there something particularly untrue about what I'm saying? is it worth a downvote?

fzzzy · on Dec 19, 2012

Not sure why somebody would downvote you? You seem correct to me.

The ability to spawn new global contexts and communicate only immutable objects between them is fundamental to actor systems. (message passing)

Unfortunately most modern scripting languages do not make it easy or cheap to spawn new global contexts. I hope this changes in the near future. (Lua is an exception, I believe)

bascule · on Dec 19, 2012

JRuby makes it easy to start as many scripting containers as you want:

http://jruby.org/apidocs/org/jruby/embed/ScriptingContainer....

fzzzy · on Dec 21, 2012

Awesome. What's the minimum overhead required for each global context?

danso · on Dec 18, 2012

This was a much more thorough article than I was expecting, will have to bookmark for later.

From the OP: > At the end of the conference, Evan Phoenix sat down with Matz and asked him various questions posed by the conference attendees. One of these questions was about the GIL and why such a substantial “two dot oh” style release didn’t try to do something more ambitious like removing the GIL and enabling multicore execution. Matz looked a bit flustered by it, and said “I’m not the threading guy”.

The fact that a lot of Ruby's development (at least MRI) is in a language totally incomprehensible to me is part of my fascination with Ruby...I remember there being some discussion awhile back about translating Matz's original Ruby documentation for historical purposes...as it is now, some of that design and thought process is probably still locked in Japanese. I'm sure he's discussed it in postings and in conferences since, but did Matz have any kind of intractable philosophical objections to threading, other than it being a ton of work involved? That is, did he or any of the MRI team think that it would take Ruby too far away from its original design goal?

1qaz2wsx3edc · on Dec 18, 2012

I don't think GIL will change, especially with jRuby as a viable option. I think Matz interested in linguistic expressions (ways to write code) and not GIL/performance issues. I'm not justifying the decision either way. We might see it someday, I hope.

zem · on Dec 18, 2012

also, people experimented with removing the GIL in python and did not get any benefit from it. http://dabeaz.blogspot.com/2011/08/inside-look-at-gil-remova... looks at some of the issues involved.

quux · on Dec 18, 2012

That was really interesting, especially this part:

'Reference counting is a really lousy memory-management technique for free-threading. This was already widely known, but the performance numbers put a more concrete figure on it. This will definitely be the most challenging issue for anyone attempting a GIL removal patch.'

If ref counting is so bad with threads, how does Objective-C do it performantly?

bdash · on Dec 19, 2012

While I've not measured the performance of the approaches, from reading the Python patch discussed in the article it would appear that Objective-C uses a more intelligent approach to maintaining the reference count in the face of concurrent manipulation.

The patch to Python involves guarding every increment and decrement of a reference count with a single pthread mutex. This pthread mutex would become a major source of contention if multiple threads are attempting operations that manipulate the reference count. Pthread mutexes are also a relatively heavyweight synchronization mechanism, and their overhead would impact performance even when the single mutex was uncontended.

In contrast, Objective-C uses more efficient means of managing the reference count. The implementation of -[NSObject retain] uses spinlocks to guard the side tables that hold the reference counts. There are multiple such side tables and associated spinlocks in order to reduce contention if multiple threads are attempting to manipulate the reference counts of different objects. CoreFoundation, which provide the implementations of many common types such as strings and arrays, uses an inline reference count that is manipulated using an atomic compare-and-swap operations. This reduces contention at the cost of increasing the storage size of every object of this type.

Someone · on Dec 18, 2012

I think it mostly is by not counting as much. In typical Objective-C code, you will find that only the UI is actual Objective-C. Also, many fields of Cocoa UI classes are 'plain old data' such as 'int', 'BOOL' or enum's. That keeps the number of objects down and decreases the amount of bookkeeping.

The GUI library also is smart enough to not allocate more objects than needed. For example, in a table, only the cells actually on the screen really exist, and all controls in a single window share a NSTextView called 'the field editor' that is used for editing text (http://developer.apple.com/library/mac/documentation/Cocoa/C...)

Finally, I do not think it is that fast. It is just modern hardware that is fast.

chc · on Dec 18, 2012

A few things:

1. Objective-C doesn't do automatic runtime reference counting the way Python does AFAIK. You either do it yourself where needed, or it's automatically inserted and heavily optimized at compile time by ARC. (I could have misunderstood how Python does it, but I think it does more refcounting than Objective-C does.)

2. Although Objective-C is pretty fast in the grand scheme of things, using its object system does entail a performance penalty when compared to similar languages like plain C or C++.

3. The garbage collector, despite being really immature and pretty quickly falling out of favor, actually did often give better performance in heavily threaded situations.

kreeger · on Dec 18, 2012

I'm not a computer scientist, but I speculate it's due to Objective-C's compilation with LLVM.

jeremyjh · on Dec 18, 2012

I really like Tony's article and appreciate all the work he has done on Celluloid. I am in early stages of writing a multi-threaded server app using Celluloid and Hamster as the basic libraries to deal with concurrency. So far I have found them to be idiomatic and pleasurable to use. It may actually be somewhat of a drawback, but Celluloid really can get out of the way to the extent that you would not even realize as the client of a particular object API that there is a message-based proxy in the middle of things. Still ,I like it that I don't have a lot of infrastructure and ceremony in my code just to be safely concurrent.

The GIL is like a boogey-man hanging over heads still. Its important to remember though that we still get a lot of concurrency in MRI; if you are bound in I/O you may not see a difference. My core app is not I/O bound and I predict I'll see enough benefit in JRuby to use it. I find JRuby to be slow in development; but library support is good and I'm presently planning to unit-test in JRuby in parallel and deploy with it from the beginning.

tel · on Dec 19, 2012

I don't want to be "that Haskell guy" but that's all I could read from this. Realistic multithreading and immutability are deeply tied. I'm very interested to see how far the MRI community can come to getting decent multithreading by implementing suggestions such as these... since my learned intuition is to just throw out mutability and plan in that, much simpler and more limited, sandbox.

fzzzy · on Dec 19, 2012

It's funny, the article doesn't really even contain anything about threading. It just has a bunch of band-aid solutions to tack on immutable message passing on top of global mutable shared state.

bascule · on Dec 19, 2012

Only one of the proposals had anything to do with immutability

fzzzy · on Dec 21, 2012

Huh? Deep Freeze, Deep Dup, and Ownership Transfer are all strategies to avoid multiple concurrent actors mutating the same objects at the same time.

Even the last proposal, which I think has to do with fine grained locking, is still a strategy to avoid issues with mutable shared state.

3amOpsGuy · on Dec 19, 2012

Do we really need threads? From my limited Ruby experience, it'll happily fork new interpreters, it has connectivity with pretty much all major messaging queue implementations as well as various serialising and networking libraries. In short, talking to other processes is easy, even if they are a bit slower than threads (but if speed is such an issue, it's unlikely ruby would be your implementation language).

Threads only ever scale so far, when you need more processor cycles you'll have to go off-host eventually. By adopting a multi-process model with data shared over the network (with or without broker queue in between) you can benefit the app's ability to scale and its robustness greatly.

For the non-compute intensive reasons to parallelise, non-blocking code often performs better (e.g. chatty networking code) than threads anyway.

If threads aren't great (they aren't in Python), forget about them and move on. There are other tools in the toolbox, with the bonus that the other tools are actually better (in most if not all cases on unix like platforms).

tinco · on Dec 19, 2012

Although you are right about threads only ever scaling so far, you need to remember that network I/O has a rather large overhead.

If you always assume your code is going to be run over a network you might miss an opportunity to efficiently solve some problems that might be solved on just a single machine with a bunch of cores.

I think frameworks like celluloid allow you to deal with this elegantly, but they need the help from the language to realize this potential, which is why bascule requests these features.

An example: a computer game might be built concurrently by having the rendering system, the two physics engines, any AI's and the main game loop execute on separate threads. Obviously there is a bunch of information to be shared between these systems with as little delay as possible.

sliverstorm · on Dec 19, 2012

Simply put, if you map out storage levels like this:

L1 -> L2 -> (L3) -> Memory -> Disk/Network

These are orders of magnitude different in performance. Network can be faster than disk, but not generally by an order of magnitude.

So, everything you know about memory vs. disk for performance ought to translate fairly well to memory vs. network.

It's a good observation that extremely performance-bound jobs might want to look to other languages, but avoiding a level of that data storage hierarchy is no meager 2-3x speedup.

jeremyjh · on Dec 19, 2012

You are right that threads are not the only way to do it, but there are some advantages to some Ruby applications. The process model is expensive in terms of memory usage. Take a look at the Sidekiq testimonials for specific examples of this: https://github.com/mperham/sidekiq/wiki/Testimonials

If you can replace 10 servers with 1 server that is not just a cost-save in terms of hosting, it also makes your deployments so much simpler that you may find yourself doing changes more frequently, as just one example.

The process model also really falls down when those threads interact. That isn't the most common model we see in for example Rails applications - more often we are thinking of either request/response or batch process, in either case its essentially a single thread that only coordinates with others through a database and maybe memcached or redis. When you have large real-time processes the communication overhead can be very detrimental to the process model.

You are right though that eventually you have to take that hit in some form or fashion to scale to another box but I don't agree you should agree to take it up-front for every process you ever develop. That said, I think Erlang is closer to this model (though isn't always serializing on a network) and that has proven to be pretty successful and efficient when viewed at a macro-scale.

lazzlazzlazz · on Dec 18, 2012

I'm not an expert in this domain, but wouldn't the threading issues that have impeded Python (and the removal of the Python GIL) also impede Ruby in the same way? I've heard solutions like "freezing" and ownership transfer before, but they're always more complex than they seem.

Thanks

chimeracoder · on Dec 19, 2012

In short yes.

The longer answer is linked in a post above[1] - it describes the problems with Python (CPython), many of which would apply to Ruby (CRuby/MRI) as well.

[1] http://dabeaz.blogspot.com/2011/08/inside-look-at-gil-remova...

radiospiel · on Dec 19, 2012

I would love that, but I don't see a sensible way to get there.

deep_dup and deep_freeze solutions would have to dup/freeze the entire object graph of an object in question, and this would have to include classes and modules as well, including the Object, Class, and Module classes. This would probably become a very huge object graph.

One way to prevent this could be to explicitely freeze such objects at some point during startup. This would still break a lot of code in the Rails world, where dynamically adding methods to a class is just standard.

Another way could be to implement copy-on-write semantics for such (and other) objects - if two threads share, say, a Class object, and one thread modifies it, this modification should then only manifest itself in one class.

bascule · on Dec 20, 2012

There's no reason that you would need to freeze anything but the state. Things like classes represent the function associated with that state. I'd generally say runtime modifications to the class hierarchy are BAD BAD BAD and you should never do them and you should feel bad when you do them, but that's a separate concern from concurrent state mutation. Detractors of OOP might wave their hands and say OOP colludes function and state, but really they're cleanly separated: it's the difference between (meta)class and instance.

Concurrent languages like Erlang allow you to swap function at runtime even though they're mutable state.

dexcs · on Dec 18, 2012

Great post, great explanations and my quote of the day:

"Well Matz, I’m a “threading guy” and I have some ideas ;)"