What about if you are processing 100 Petabytes? And you are comparing to a 1000-...

ggreer · on Jan 18, 2015

Right tool for the right job. 100 petabytes is 50,000,000 times larger than the data in the post. It's the difference between touching something within reach and flying around the world.[1]

1. Earth is 40 megameters in circumference. 40Mm / 50M = 0.8m

dundun · on Jan 19, 2015

Then you're hardly using commodity hardware anymore. While jobs like that probably actually work on Hadoop, I'd imagine a problem like that might be better suited for specialized systems.

TallGuyShort · on Jan 19, 2015

IME, most installations where Hadoop is "successfully" used it's running on pretty high-end machines. "Commodity hardware" really means standard hardware, not cheap hardware (as opposed to buying proprietary appliances and mainframes).

MichaelGG · on Jan 19, 2015

Or it could be a company with 30M records a month that buys 100 x $200 servers off eBay and is still unable to query their data.

TallGuyShort · on Jan 20, 2015

I'm not sure what your point is with a hypothetical situation. Why wouldn't the be able to query their data? All I'm saying is from my actual experience with real users, it's best to build a Hadoop cluster with high quality hardware if you can.