REIA language designer on Twitter/Ruby/Scala discussion

tptacek · on April 5, 2009

I like Tony Arcieri, but this is a weak and somewhat unseemly argument, easily knocked down:

(1) Most urgently, Alex isn't bashing Ruby. He's at pains to point out that Twitter continues to use and like it. Pointing out the relative strengths of other languages isn't "bashing".

(2) The technical problems Alex has with Ruby are bona fide well known problems with Ruby. Put aside the green threads debacle and the MRI VM is still a dealbreaker. I have lots of long-lived EventMachine code, and MRI simply isn't up to the task. Arcieri even stipulates to this in his post.

(3) In deriding Kestrel, Arcieri is ignoring several of key facts that Twitter had to face when they built it: first, that they were already down the path of a house-built message queue (they didn't go to Scala to build Starling, which moots his critique of it); second, that apart from Apache, none of the competing projects were mature enough for Twitter to commit to at the moment in time we're bickering about; and third, that his preferred queue (RabbitMQ) would have required them to commit to yet a third exotic platform, Erlang, the Ron Paul of programming environments, and would have won them no more meaningful performance than they got with Scala, but would have robbed them of JVM compatibility.

It's the third point that rankles me the most. In a 24 graf jeremiad, we find only 10 grafs in that Arcieri can't make a performance-based critique of anything but a straw man (Starling), and can only do it by himself rejecting Ruby in favor of Erlang. This is the best the Ruby community can do to answer Alex's argument?

jeremymcanally · on April 5, 2009

(1) I think the rub isn't so much that they're bashing Ruby as it seems like rather than evaluating all their options, they just jumped ship to some random unproven technology for whatever reason. They then turn around and say "well we switched because Ruby didn't do what needed" (but check out my book on the language that does!!). It makes one think they don't know if it did or didn't, since JRuby would've solved a goodly number of their problems (probably; we'll never really know). Just like when they wrote Starling, they seemed to have just decided to hack something out rather than make a reasoned technical decision.

(2) Yes, but JRuby and Ruby 1.9 both handle those problems much better. Moot argument.

(3) So rather than picking an "unproven" technology (by some measure of unproven), they write something completely new in a comparatively immature programming language. Yup. Much better choice.

And since when did JVM compatibility matter? The reason they chose the JVM was good threading and so on, but Erlang has that same support. It wouldn't be introducing a third "exotic" platform, but a second, different platform.

Of course, I'm saying this stuff from an outside perspective. I'm hoping he elaborates on his blog, because I would really be interested to hear a more technical explanation of their decisions.

tptacek · on April 5, 2009

I really think you're going to lose this argument. You're defending a blog post that says that Kestrel was a far worse choice than a single-developer C project with no major success stories. I don't think, and I don't believe that you think, that Scala is as likely to be a failure mode for Twitter as MQ is.

However immature Scala is --- and I'm not using it --- the Scala runtime is absolutely rock solid. I'm sure that's true of JRuby as well, but the comparison isn't between JRuby and Scala, it's between MRI and Scala, and for a company that tolerated high-volume messaging servers in MRI, we both know Scala is going to be like shangri-la by comparison.

jeremymcanally · on April 5, 2009

I'm not defending the post, but countering your points. If my points happen to line up with his opinions, then that's merely coincidence. There are a lot of MQ options out there, many of which I've used with great success (with Ruby no less). To argue that one of those is less acceptable than a home grown solution in Scala is, at best, dubious.

But you just danced around the real question: why not JRuby? Scala's runtime == JRuby's runtime. They're both JVM languages, and if they were to use JRuby, there wouldn't be some big crazy rewrite. The only difference would've been "jruby mq.rb" rather than "ruby mq.rb." That's the decision that hasn't really been explained.

Even further, I'm not sure why you're insinuating that I think they should write a message queue in MRI. Either use JRuby or use something else. I totally agree MRI is not acceptable for something like this (but 1.9 may be; I haven't tried it but its performance is only a hair slower than JRuby), but there are other ways to solve the same problem that don't involve rewriting tons of code (either by using JRuby or by using a proven, solid drop in replacement, possibly with an API shim if they really needed it).

ankhmoop · on April 5, 2009

JRuby is not a 1:1 mapping of a Ruby to Java bytecode -- there's significant additional book-keeping that must be done by JRuby's runtime (for example, maintaining the Ruby call frames).

In contrast, Scala maps to the JVM as closely as possible. Scala classes are Java classes -- Scala and Java are bidirectionally interoperable, and Scala's performance subsequently benefits.

jamesbritt · on April 6, 2009

"This is the best the Ruby community can do to answer Alex's argument?"

One of the more annoying aspects of this debate is how often it is portrayed as happening on behalf of some imaginary unified "Ruby community."

There is no Ruby community. There are numerous cliques and clusters and crowds.

The individuals speaking up are just that, they are not members of some special Ruby Community Leader cadre or any other such nonsense.

defunkt · on April 4, 2009

As I perhaps somewhat self-aggrandizingly consider myself one of the most knowledgable people regarding I/O in the Ruby world, I decided to peek around the Starling source and see what I discovered. What I found was a half-assed and pathetically underperforming reinvention of EventMachine, an event-based networking framework for Ruby which is the Ruby answer to the Twisted framework from Python.

The non-evented Starling was multi-threaded and used a thread pool to manage connections. I'm unclear on how that is a 'half-assed implementation of EventMachine' and not simply a multi-threaded network daemon.

tptacek · on April 5, 2009

EventMachine supports an actor-style threading model, which may be what Tony was comparing it to; pretty clearly, Tony Arcieri knows EventMachine and async programming --- he's one of the better known people in the EventMachine "community".

defunkt · on April 6, 2009

Does that make every multi-threaded network daemon that manages a thread pool of connections a 'half-assed implementation of EventMachine' then?

(I know who Tony is. I co-wrote Evented Starling.)

tptacek · on April 6, 2009

Ok. It wasn't clear from your comment whether you were saying Arcieri didn't know what he was talking about. Sorry.

andr · on April 4, 2009

Message queues are something the financial industry has been getting right for years (even down to custom hardware-based implementations), so the author brings a good point that it's stupid to reinvent yourself if you don't have an experience.

njharman · on April 4, 2009

His analysis of starling reinforces an impression I got of Twitter devs from articles/discussions back when they were having lots of uptime/scaling issues. The impression was they weren't that experienced or all that good.

It makes me curious how many early startups aren't composed of rockstar devs. How much (if at all) timing, luck, marketing matter more than dev ability at the beginning.

I tending to think it's not nearly as important to have experienced rockstars from day 1. It's not until you get enough success to become famous and start attracting experienced rockstars that it becomes critical to recognize and hire them.

jacktang · on April 4, 2009

While complain Ruby, why not make some language level contribution? Or is it very cheap for Twitter to rewrite the whole stack? I am wondering...

mechanical_fish · on April 4, 2009

[Note: I can't read the original link -- the site is down -- so I have no idea what the original submitter said. But let me take a guess about what you're saying.]

Try to put yourself in Twitter's shoes. Your viral app is a fantastic, unprecedented success. Your traffic is doubling every week. The Fail Whale is onscreen so much that it has its own name, its own fan club, and its own T-shirts. Techcrunch is rumbling about all the other entrepreneurs who are setting up to clone your service.

The idea that a language-level change to Ruby is a wise thing to pursue at this point is insane. Ruby has a big and complicated code base. You are not a language designer. You probably won't even figure out what you could do that would help. If you do, the change will probably result in an internal-only fork of Ruby that can't be reliably patched and that is incompatible with a random cross-section of your third-party libraries. Deploy that thing and you will be finding and fixing Heisenbugs all over the codebase for the next six months.

To actually get an official change into Rails takes months, minimum. In the case of Ruby that might stretch into a year or two. Because you must first win a series of online arguments, and then you must wait for lots and lots of third parties to test your change against their apps and libraries and report or fix the bugs.

Yep, much cheaper to just rewrite your whole stack using different infrastructure. Several times, if necessary, as experiments. Twitter is expert at rebuilding their own stack -- what has been done before is easier to do again.

jacktang · on April 4, 2009

＞ I can't read the original link -- the site is down

Well， it is up. You might need some http proxy to read the article

> To actually get an official change into Rails takes months, minimum. In the case of Ruby that might stretch into a year or two.

Twitter can obviously fork Ruby code base and maintain their own branch if they like.

grandalf · on April 4, 2009

Great point about Twitter never explaining why it didn't just use one of the many awesome open source message queues already in existence.

simonw · on April 4, 2009

He suggests the following message queues:

http://www.rabbitmq.com/ - first version 8th February 2007, but the first version not to have "alpha" or "beta" status was 1.5.1 released 21st January 2009

http://memcachedb.org/memcacheq/ - version 0.1.1 released 26th November 2008

http://www.ejabberd.im/ - not really a message queue

http://xph.us/software/beanstalkd/ - first public release 11th December 2007, hit 1.0 28th May 2008

http://activemq.apache.org/ - not sure when it was first released but the mailing list goes back to December 2005

The first public release of Starling (Twitter's first custom message queue) was 10th January 2008. Presumably they had it running internally for a while before they released it.

From this, we can see that when they built their own pretty much the only realistic open source option was ActiveMQ, which can hardly be described as a light-weight solution (not to mention it still doesn't have a stellar reputation under high loads). When the alternatives aren't rock solid yet, rolling your own (where at least you understand all of the code and how it works) seems like a perfectly practical alternative.

evgen · on April 4, 2009

While it is possible that Starling had been running internally before its release, this does not excuse overlooking rabbitmq. A software package that had proven itself in real-world scaling and been designed by people with real experience in the problem domain (c.f. the financial services world) is going to be much better at "alpha" or "beta" quality than Starling is going to be even after the twitter devs hammer at it for a couple of years. The twitter devs were starting from scratch, writing something that other people out there actually had some experience with, and decided to not take an existing solution and fix/adapt it to their needs.

tptacek · on April 5, 2009

I take issue with the idea that the financial services world has real experience in Twitter's problem domain. My experience with the financial services world is significant technically, but casual in a career sense. That said:

I think hi-fi devs make lots of stupid decisions in the name of performance. In the few cases where their actual outcomes match up to their posturing, it's because their code is obsessively cobbled around one specific use case they've been working on since 1989.

Have you ever read an order management system, or looked at Tibco Rendezvous on the wire?

Most of the hi-fi companies adopting MQ are built around straight AMQ, and bare-metal performance was out the window long before they bolted their crappy WebSphere app onto it. What these companies are looking for is predictability, not performance, and their problem sets are much simpler and most stable than Twitter's.

simonw · on April 4, 2009

RabbitMQ was less than a year old, and significantly more fully featured than what Twitter needed. Fixing bugs in that would be a whole lot harder than fixing bugs in 1500 lines of code they wrote in a language they knew.

johnbender · on April 4, 2009

Yes but what's the rationale for doing it now in scala?

simonw · on April 4, 2009

At a guess, a few reasons. Firstly, they had everything else written against a message queue with particular behaviour - so better to upgrade that queue than switch to a completely new one and have to rewrite everything that interacts with it.

Secondly, after running a custom message queue for well over a year they know EXACTLY what they need from one, so writing their own still makes sense.

Thirdly, if they're going to start moving other core bits of Twitter infrastructure to Scala it makes sense to try it out with an important piece of the puzzle that they thoroughly understand first.

And finally, Twitter's core competency is delivering messages. As such, it's really not so extreme to use their own software to do that - they work at a high enough scale that they need to be experts in whatever solution they are using.

As Douglas Crockford once said, "The good thing about reinventing the wheel is that you can get a round one.". The fact that Twitter's reliability over the past 6-12 months has been a huge improvement, despite the enormous growth the service has seen (it's mentioned in the mainstream media all the time) would suggest that their decision to roll their own message queue paid off.

tptacek · on April 5, 2009

It's also worth noting that the one queue Arcieri proposed might have been interface-compatibile --- MemcacheQ --- is a single-developer side project written in C. Arcieri is "completely confused" by the fact that that Twitter didn't adopt this as the core of their service.

NealR · on April 6, 2009

This is a straw-man argument. MemcacheQ is a straightforward mash-up of two very stable software stacks.. memcached and BerkeleyDB. The lines of code to accomplish it are trivial. So what.

tptacek · on April 7, 2009

It's 4000 lines of C code, not counting headers. Try again.

NealR · on April 9, 2009

No.. the bdb.c file is 800 lines of trivial near-BDB example level code. The rest of it is from memcached core. Either way it's solid. Read the code.

mechanical_fish · on April 4, 2009

I can't read the original article right now. So I must poke around in the darkness when trying to respond. Sorry if I'm being unfair.

But it sounds like we're once again enmeshed in the never-ending process of second-guessing Twitter. Let me offer a pertinent quote, from the Hugo-winning novel The Vor Game:

[You did] a right thing. Perhaps not the best of all possible right things. Three days from now you may think of a cleverer tactic, but you were the man on the ground at the time. I try not to second-guess my field commanders.

Twitter solved their problem. It is unlikely that they did so as elegantly as possible, and it's quite possible that their logic at the time will be of no use to the rest of us at all, because it was based on incomplete information about a rapidly-changing software universe.

jeremyw · on April 4, 2009

Without second guessing them, it remains a fun curiosity that at their size and failure rate they continue to reject vendor expertise in this space. If Starling and Kestrel reinvent little, what can we learn about this counter intuitive behavior, especially re management and investors?

intranation · on April 4, 2009

I believe they call that the "not invented here" phenomenon.

samt · on April 4, 2009

Having evaluated oss message queues for high volume web services I can see exactly how twitter might make this decision. - activemq: high resource usage, not super stable at high loads - rabbitmq: messages must ALL fit into memory. Your message queue should be a "rescue me" button - if all your dbs go down, just queue crap to disk. You do not want to worry about memory overflow in that situation. - memcacheq: super stable, low resource consumption but fixed message size (padding for smaller messages).

We ended up using memcacheq but it's hardly what you would call full featured, so we'll have to do prioritization, etc in our code.

tptacek · on April 5, 2009

Awesome comment, thank you.

jjames · on April 4, 2009

I'm getting a 404 message that is trying to sell me novelty gifts.