Hacker News new | past | comments | ask | show | jobs | submit login
Realtime Hadoop usage at Facebook: The Complete Story (hadoopblog.blogspot.com)
67 points by LiveTheDream on July 3, 2011 | hide | past | favorite | 6 comments



This is extremely interesting stuff. The posts that continue tease the problem of having to process millions of events in real time. Here is the direct link to the paper http://borthakur.com/ftp/RealtimeHadoopSigmod2011.pdf incase anyone doesn't feel like clicking through. What I'm very drawn in by is facebook's continued and heavy use of mysql. They didn't run off to some nosql based datastore when things got tough, they learned to scale mysql, they leveraged its strong points while trying to minimize using its weaknesses. And then they looked at alternatives that compliment their existing systems. This is scaling at its best and as a system administrator its the kind of challenge I find both fascinating and exciting.


"They didn't run off to some nosql based datastore when things got tough" - not exactly... they did create Cassandra, and these days they run quite a few things on top of HBase.


Yes however I can imagine it wasn't the solution they went to in the beginning. They continued to work with mysql and then implementing caching layers before getting to the point of needing another datastore.


Interesting, as much as I don't really like facebook.

I only see Hadoop being used even more widely now - that, or spinoffs/improvements, etc., as needed, unless Hadoop really can scale as much as possible.

At least, that's what this article tells me.

http://www.infoq.com/news/2011/06/hortonworks


It sure is difficult to find companies that are NOT using Hadoop for somewhat morally questionable stuff (like ad targeting).

It's impressive what the Facebook guys did with Hadoop/HBase, but it looks like they patched here and there and everywhere to make it work for their specific use cases. Makes me wonder if a from-scratch redesign is in order to really get realtime processing to work properly with Hadoop.


Is Facebook contributing these changes upstream?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: