Do we really need threads? From my limited Ruby experience, it'll happily fork n...

tinco · on Dec 19, 2012

Although you are right about threads only ever scaling so far, you need to remember that network I/O has a rather large overhead.

If you always assume your code is going to be run over a network you might miss an opportunity to efficiently solve some problems that might be solved on just a single machine with a bunch of cores.

I think frameworks like celluloid allow you to deal with this elegantly, but they need the help from the language to realize this potential, which is why bascule requests these features.

An example: a computer game might be built concurrently by having the rendering system, the two physics engines, any AI's and the main game loop execute on separate threads. Obviously there is a bunch of information to be shared between these systems with as little delay as possible.

sliverstorm · on Dec 19, 2012

Simply put, if you map out storage levels like this:

L1 -> L2 -> (L3) -> Memory -> Disk/Network

These are orders of magnitude different in performance. Network can be faster than disk, but not generally by an order of magnitude.

So, everything you know about memory vs. disk for performance ought to translate fairly well to memory vs. network.

It's a good observation that extremely performance-bound jobs might want to look to other languages, but avoiding a level of that data storage hierarchy is no meager 2-3x speedup.

jeremyjh · on Dec 19, 2012

You are right that threads are not the only way to do it, but there are some advantages to some Ruby applications. The process model is expensive in terms of memory usage. Take a look at the Sidekiq testimonials for specific examples of this: https://github.com/mperham/sidekiq/wiki/Testimonials

If you can replace 10 servers with 1 server that is not just a cost-save in terms of hosting, it also makes your deployments so much simpler that you may find yourself doing changes more frequently, as just one example.

The process model also really falls down when those threads interact. That isn't the most common model we see in for example Rails applications - more often we are thinking of either request/response or batch process, in either case its essentially a single thread that only coordinates with others through a database and maybe memcached or redis. When you have large real-time processes the communication overhead can be very detrimental to the process model.

You are right though that eventually you have to take that hit in some form or fashion to scale to another box but I don't agree you should agree to take it up-front for every process you ever develop. That said, I think Erlang is closer to this model (though isn't always serializing on a network) and that has proven to be pretty successful and efficient when viewed at a macro-scale.