Database scaling isn't their problem. Sure they push lots of data around, but great programmers should be able to architect an optimal caching layer between the app and the database.
So now that metadata caching is out of scope here, we turn to frameworks, architectures, and application design. The issue Twitter has is in aggregation of their meta data. People who propose a distributed solution for Twitter obviously miss the inherent nature of Twitter; it's a centralized system. Twitter should remain as it is, but ditch RoR and become more modularized. This is where they start developing real systems; the type of systems they talk about in those mundane CS classes, like programming in C -- that stuff. Modularize the applicaion, develop systems that scream for aggregation, cache the hell out of everything, and start applying some computer science.
Twitter was born as one of those next-gen Web 2.0 "keep track of your friends" hip Ruby on Rails insertbuzzwordhere application. Now it poses actual architectural challenges. It's so similar to the evolution of Facebook. Just take a step back and look at it.
You're plainly ignoring all the posts where the people involved say ROR is not the problem. Caching does not work if the data is different for everyone. Please stop hearing what you want to hear--that ROR is a trendy flash in the pan and not useful har har--as you read and instead take a step back and listen to what the people involved are actually saying.
"Caching does not work if the data is different for everyone."
I think caching can work, but at a different level of granularity. Rather than cache a person's full timeline, which is composed of multiple sub-feeds (each of which requires a database query), cache the data from the sub-feeds themselves, then recombine them on every page load. This would significantly lower the number of database queries, as each cache element would be invalidated only when its "owner" sends a tweet. This solution would be much more CPU intensive on the application servers, though, and Ruby may not be the best tool for the job if that were the case.
"Traditional page caching does not work" would have been a better way for me to phrase that. I don't mean to suggest that no caching anywhere in the stack (machine or software) will be used--that would be absurd. I only mean that the cheap and easy page caching used to scale most web apps where you keep pages or chunks of pages in a cache goes out the window for twitter.
You simply cannot argue that caching doesn't work if the data is different for everyone. I understand that. It's aggregation. But the metadata lives within a database and it is imperative for that to be glossed over with a RAM based cache.
I really hope my initial post wasn't interpreted the way it was to validate that response. I don't take back anything I said--and I'm sure it's not what I want to hear. Architectures scale, not languages... Twitter simply cannot be distributed... it's not news, I've written about it earlier.
So now that metadata caching is out of scope here, we turn to frameworks, architectures, and application design. The issue Twitter has is in aggregation of their meta data. People who propose a distributed solution for Twitter obviously miss the inherent nature of Twitter; it's a centralized system. Twitter should remain as it is, but ditch RoR and become more modularized. This is where they start developing real systems; the type of systems they talk about in those mundane CS classes, like programming in C -- that stuff. Modularize the applicaion, develop systems that scream for aggregation, cache the hell out of everything, and start applying some computer science.
Twitter was born as one of those next-gen Web 2.0 "keep track of your friends" hip Ruby on Rails insertbuzzwordhere application. Now it poses actual architectural challenges. It's so similar to the evolution of Facebook. Just take a step back and look at it.