Twitter's problems have never been Rails problems

blader · on July 10, 2008

The only problem with Rails are all the armchair scalability experts and backseat developers who read TechCrunch and use Twitter and think they know a thing or two about scaling. When Julia Allison goes on web radio and starts holding court on it, you know that whole meme has really jumped the shark.

Complaining about Rails' scalability is a really really good signal that you've never had to scale anything. If you had, then you'd already know what a small part your application framework plays in a scalable architecture.

j2d2 · on July 10, 2008

Can you explain a little about the problem? It makes sense to me to store everything in the database and scale that. A solution to scaling the database seems to be horizontal partitioning where possible. Is this hard to do?

Am I missing something about why scaling frameworks like rails or django is hard?

On the topic of twitter scaling, I can see the need for heavy database activity here causing scaling issues. Can anyone provide insight into how they solved this if it wasn't something like horizontal partitioning?

mechanical_fish · on July 10, 2008

It makes sense to me to store everything in the database and scale that... Is this hard to do?

Is Larry Ellison a multibillionaire?

I think it's impossible to explain database scaling and performance tuning in a handful of paragraphs, and I'm certainly not the one to try. If I were to try and tell you exactly what it is about Twitter that makes engineers cry, I'd point out that every single page is dynamic, every user's main page requires a giant JOIN, there are lots of writes coming in all the time from every direction and writes are harder to scale, low latency is a requirement for many people, and there's no obvious axis along which to "partition" Twitter. For example, the PlentyOfFish guy talked about how he could split up his databases based on geography -- it's overwhelmingly likely that people in South Bend, Indiana want to look for dates within fifty miles of South Bend, Indiana, rather than in Spokane. But on Twitter I can follow anyone and anyone can follow me, so the giant JOIN that builds my homepage has to span the entire dataset. And, sure, you can build a cache for every user, but then every user who sends a tweet to 1000 followers triggers the update of one thousand caches, with one thousand internal messages to one thousand event queues... and when one machine full of user caches goes down, what then? It's not acceptable to drop the message on the floor for a subset of users.

Some folks will solve this, but they will be better hackers than I, they will spend a lot of money on hardware, and they will drink a lot of coffee. And it will take time.

j2d2 · on July 10, 2008

If I were to try and tell you exactly what it is about Twitter that makes engineers cry, I'd point out that every single page is dynamic, every user's main page requires a giant JOIN, there are lots of writes coming in all the time from every direction and writes are harder to scale, low latency is a requirement for many people, and there's no obvious axis along which to "partition" Twitter.

That's more along the lines of what I was looking for. Thanks. This sounds tricky...

Just came across this link now: http://highscalability.com/scaling-twitter-making-twitter-10...

Andys · on July 11, 2008

You're right, its not hard up to a point.

That 'point' is where a world of troubles begin. That point is where the number of database updates per second exceeds the capability of your hardware platform.

At this point, like massive sites such as YouTube, flickr, or facebook, you are faced with the option of database sharding - splitting your data up by value (ie. Users A-K on this server and L-Z on that server).

You can no longer treat the database as a black box which does all the magic of multi-table joins. Rails also does not provide any tools to help accomplish sharding out of the box.

Hoff · on July 10, 2008

Twitter is a multicast-capable software-based crossbar switch, with a necessary form of message journaling and replay.

Where the destination of or target of the message is what selects and pulls the messages.

And with sparse addressing.

Twitter reverses the norms of routing; the receivers control the routing. It's a distributed forest of packet-cloning and routing servers.

And the entirety of the routing tables involved here are stonking huge.

I'd not look to apply nor use Rails and typical databases here (first); I'd approach this from a completely different direction. This application just doesn't fit into a classic database. You're not even sharding the right pieces here, if you're looking at the database(s).

jonknee · on July 10, 2008

Humorously he updated the post to highlight a comment that's "right on". A comment about a popular Rails site (insiderpages.com) that apparently handles high traffic with is. Except that it's unavailable at the moment. Right on indeed!

aditya · on July 10, 2008

works for me.

jawngee · on July 10, 2008

Are you sure it isn't a rails mindset problem?

mechanical_fish · on July 10, 2008

Yes, indeed -- if by "rails mindset" you mean "build the minimum amount that will work - YAGNI; launch early to gauge interest and get feedback; use established, standard tools, methods, architectures and libraries when possible; premature optimization is the root of all evil; when you do hit a scaling problem, it will almost certainly be architectural in nature."

Of course, once your app explodes in popularity and you realize that it's more of a messaging service than a CRUD app, you've got a problem -- the Rails Mindset problem. But this is the problem that the Rails user wants to have. The problem the Rails user doesn't want to have is the one where you spend months learning how to engineer a scalable messaging app, launch it, and find that nobody cares and that it can't be marketed.

(And that was the obvious risk for Twitter at its inception. Everyone thought Twitter was a waste of time at first. I certainly wouldn't have wasted more than a few weeks building it, and I would have never imagined that I'd wind up struggling to move millions of messages a day.)

If you want a world where you don't have any problems at all... I don't know what to do, but don't found a startup. ;)

(Sorry you got downmodded, BTW... my advice is "be less terse". I tried the Zen Koan method of posting a few times when I started out here; it's unreliable.)

pjhyett · on July 10, 2008

This is spot on.

Unless you raise a bunch of capital affording you the time to try and build a scalable architecture before flipping the switch, it's just not feasible to spend time on problems that don't exist (and may never if you're site doesn't take off).

The irony is even if you did try and build an architecture that scales from day 1, there's a really good chance that you'll have to redo it anyway once you spot the real bottlenecks.

gaius · on July 11, 2008

Architecture != "a pile of kit in a datacentre".

System architecture is algorithmnic in nature. There's no charge for having good architecture. You just need to put some thought in up front. In choosing to use Rails - and note this is NOT a knock on Rails - Twitter chose to side-step a large chunk of architectural work - and that choice - NOT the choice of Rails - is what bit them.

blader · on July 11, 2008

This, this is smart.

swombat · on July 10, 2008

I wish I had more than one mod point to give you.

Scalability issues are a very nice problem to have.

sant0sk1 · on July 10, 2008

I gave him one on your behalf. Now if you upmod this comment you will no longer owe me one ;)

swombat · on July 10, 2008

Done! :-P

Now up-mod me again. Let me know when you've done it, and if we keep this up we might find out whether pg imposed any limit on commenting depth ;-)

rapind · on July 10, 2008

now you both owe me.

blader · on July 10, 2008

... until you have them.

swombat · on July 10, 2008

Not really. When you have them, that means you have lots of users. Any problem associated with lots of users is a great problem to have.

blader · on July 10, 2008

I can't tell you how many times I've wanted to punch people who've told me that in the face when I was sleeping for 2 hours day for a month.

alex_c · on July 10, 2008

Yeah, lack of sleep makes me irascible too.

Looking back today, would you have preferred a perfectly engineered system with no users?

mechanical_fish · on July 11, 2008

I think the technically correct answer to your question is "door number three".

I'd watch out, though, because if I were blader I'd be sorely tempted to answer by punching you in the face. :) It's not like you didn't receive fair warning!

swombat · on July 11, 2008

It's a fallacy to think that you can avoid all scalability problems. You can't. If you're successful, you will have scalability issues, period. So yes, since the problem is inevitable so long as you're successful, it is a nice problem to have.

It's a bit like having a cold. That's practically inevitable at some point in your life if you're alive. Well, I'd rather be alive with a cold than dead, even though having a cold sucks. If you don't want scalability issues, they're very easy to avoid: just build a crappy service. One of my start-ups never had any scalability issues, and never had a stampede of users either.

jrockway · on July 10, 2008

No, I don't think that's the problem. I think the problem is that they thought a CMS-style application would work. After all, Twitter is blogging, and Wordpress is a CMS-style app.

What they didn't realize is that Twitter isn't blogging, it's an IM service. I think there have been enough blog posts about this (so look there for details). What I don't understand is why Twitter hasn't been rewritten to be an IM system yet. They seem to think that turning off features (like Replies) is the solution to their problems.

It's not. Re. Write. It.

mechanical_fish · on July 10, 2008

What I don't understand is why Twitter hasn't been rewritten to be an IM system yet.

Now's your chance. Twitter may never be more vulnerable than they are now.

(Though you might want to consider that it will take you months to catch up to the enormous marketing advantage that Twitter got by building the "wrong" architecture first, using Rails.)

But first, consider this use case: I hear that you belong to a certain IM system, so I go to the system's home page and type in your nickname. I'm presented with an up-to-date paged list of all the IMs you've ever sent, to anyone. I'm also presented with an up-to-date list of your followers (with cute little photos, all of which are up-to-date) and a list of those you follow, and by clicking any of their names I can get a list of all the IMs they have ever sent, to anyone.

I'm not much of an IM user, so maybe I'm just naive... but I don't know of an IM system that does that. Perhaps because this idea is really expensive to scale, so nobody at (e.g.) America Online has ever bothered to try, because it wasn't evident that such a thing would be valuable until the folks at Twitter launched a quick Rails app to try out the idea.

icey · on July 10, 2008

Who is Evan Weaver? How does he know what Twitter's problem is?

sant0sk1 · on July 10, 2008

Evan Weaver is a Ruby programmer who Twitter hired awhile back to help fix their scaling problems.

icey · on July 10, 2008

Thank you; that wasn't immediately evident by looking at his bio on his blog.

mhartl · on July 10, 2008

It isn't? http://blog.evanweaver.com/about/ says "I currently work at Twitter" and has a link to his resume, which says he started there in May.

icey · on July 10, 2008

Bizarre; I didn't see that there when I looked earlier.

tptacek · on July 11, 2008

Evan wrote bleak_house. Alongside the Nimble Method guys, he's one of the best known Rails tuners. He knows what Twitter's problems are because they hired him to fix it.

bprater · on July 10, 2008

Twitter isn't your standard Rails application. It was hacked out with Rails because it was easy with Rails. Nobody clearly understood that Twitter wasn't a traditional Rails app until it was too late.

The biggest scaling issue with frameworks (like Rails) that obscure the database operations, which tend to be the most expensive, is that inexperienced developers won't spend time picking through the logs, trying to find ways to eliminate or speed up queries.

gaius · on July 11, 2008

That's very true. I mean, the very same people who use Rails probably sneer at VB but it's the same principle - you can trade development speed for runtime efficiency.

The important thing is to be aware when and why you are doing this.

ericb · on July 10, 2008

One problem is that startups don't usually approach scaling like mature enterprises. Your users should not be your load test.

Twitter has money now. It's time for them to grow up and use a test environment that mirrors their live site, use something like loadrunner, simulate the next level of traffic, and remove bottlenecks before that traffic-level hits their live site.

LogicHoleFlaw · on July 10, 2008

Most startups, if they approached scaling like mature enterprises, would never have the opportunity to become a mature enterprise.

ericb · on July 11, 2008

I agree with you--my point was not that twitter should have initially approached scaling like large enterprises do--only that as a natural consequence of smallness, small companies don't and aren't in the habit.

Twitter now has the resources and real user load. The window where performance problems are permissible closes eventually--see Friendster. Premature optimization is the root of all evil. But when your userbase is threatening to leave and dreaming up alternatives on their blogs daily, it's hard to call it premature.

swombat · on July 11, 2008

You're making it sound like so-called "mature enterprises" approach scaling in a mature way. Perhaps "enterprises" like Yahoo, Amazon and Google do, but I worked in a very mature investment bank for 4 years and though they had some load testing, none of the systems I ever saw built were prepared to handle a load that might grow by a factor of 10 in a year. In fact, most of them struggled to slowly deliver even with the current load. That's true for both internal bank systems and external client-facing ones.

ericb · on July 11, 2008

So your company does load testing. I didn't claim every big company tests adequately, or even has a process in place, so lets put straw men away.

Consider apps like major hosted tax apps on tax day (Quicken, H&R block, etc.). Consider the load they are under--a substantial portion of the US files their taxes in a 12 hour period. It makes twitter look wimpy. And they release what is essentially a rewrite every year.

How do they make something so critical work under a massive load their app has yet to undergo, in advance, while twitter fails? Load that grows 100x in a year can be modeled, estimated, and simulated and the software can be tuned and hardware scaled in advance. They manage it. We have the technology.

BrandonM · on July 11, 2008

As of now, this submission has 49 points. That means that its point total is 2.5 times higher than the number of words and half the number of characters that the original post had (before the "postscript" was added). Does anyone else think that's a bit absurd, especially considering that its not some timeless piece of advice or pearl of wisdom?

volida · on July 10, 2008

"I just want to go on record saying that none of Twitter's problems have ever been Rails problems."

The way that the phrase is wrriten, demonstrates that there was lack of organization.

j2d2 · on July 10, 2008

It also doesn't say anything. It'd be nice if he explained a little more of his position. A common thing I hear is that Rails simply puts the problem of scale into the database. The problem may not be a Rails problem, but the design of Rails doesn't solve the problem either. It just puts it somewhere else.

I have not worked with Rails, though. This is just what I've read. I'd prefer if an insider like Evan Weaver would elaborate a little more on what problems they had and how they solved them. All I know is that some people say Rails is a problem and they have evidence for that and this guy says it's not and suggests nothing more.

nostrademons · on July 10, 2008

Many folks would say the database is where scaling problems belong. Lots and lots of programmer hours have already been expended in figuring out how to scale databases; no sense duplicating that effort in scaling your web framework.

I'd say Twitter's problem is that they're a multicast messaging app with a database backend, a combination that tends not to work so well. As an intern in college, I worked on a financial J2EE app that tried to do everything through JMS messaging queues backed by Oracle. It had similar scalability and performance problems.

In Twitter's defense, they didn't know they'd be a multicast messaging app when they started, and a database is a logical choice for what they did know they'd be (a website). They'll figure things out; they just need time and resources to rearchitect.

j2d2 · on July 10, 2008

Many folks would say the database is where scaling problems belong.

This makes sense but I've found the db to be a rather large bottleneck. On one hand, it's nice to store everything in a single blackbox and the apps don't have to care about data. On the other, databases can be slow and I figure some kind of caching is involved between them.

I could see that a site would simply use the db to generate static pages which would then be cached, but as you said, multicast messaging with a database backend seems like a tricky combination.

Edit: I realize I've kinda thought outloud here. I've summed up the general idea with the question below.

I take the problem to be that going to the actual database is necessary. What are the possible solutions for something like this?

Horizontal partitioning seems like the obvious answer. Is it that easy, though?

LogicHoleFlaw · on July 10, 2008

Horizontal partitioning depends on your data. Do you have lots of joins and interdependent data? Partitioning will be tricky. Do you just need to store a lump of data relevant to one customer, many times over? Horizontal partitioning is easy.

Relational databases are really cool. They're not the right technology for all apps though. The big wave-makers right now are things like Google's BigTable and Amazon's SimpleDB. They're implicitly horizontal but with reduced querying abilities compared to traditional SQL.

It's a hard problem to scale a traditional relational DB across multiple independent commodity boxes like we see at Google or with AWS. The "stuff everything in a giant hashtable" approach scales nicely across a bunch of cheap boxes.