Hacker News new | past | comments | ask | show | jobs | submit login

What about if you are processing 100 Petabytes? And you are comparing to a 1000-node Hadoop cluster with each node running 64 cores and 1TB of main memory?



Right tool for the right job. 100 petabytes is 50,000,000 times larger than the data in the post. It's the difference between touching something within reach and flying around the world.[1]

1. Earth is 40 megameters in circumference. 40Mm / 50M = 0.8m


Then you're hardly using commodity hardware anymore. While jobs like that probably actually work on Hadoop, I'd imagine a problem like that might be better suited for specialized systems.


IME, most installations where Hadoop is "successfully" used it's running on pretty high-end machines. "Commodity hardware" really means standard hardware, not cheap hardware (as opposed to buying proprietary appliances and mainframes).


Or it could be a company with 30M records a month that buys 100 x $200 servers off eBay and is still unable to query their data.


I'm not sure what your point is with a hypothetical situation. Why wouldn't the be able to query their data? All I'm saying is from my actual experience with real users, it's best to build a Hadoop cluster with high quality hardware if you can.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: