Hacker News new | past | comments | ask | show | jobs | submit login

on a couple of GB this is true, actually if you have ssd's I'd expect any non compute bound task to be faster on a single machine up to ~10gb after which the disk parallelism should kick in and Hadoop should start to win.



Depends on the dick, depends on the storage.

HDFS is a psudeo block interface. If you have a real filesystem like lustre, or GPFS, not only do you have the abilty to use other tools, you can use that storage for other things.

In the case of GPFS, you have configurable redundancy. Sadly with lustre, you need decent hardware, otherwise you're going to loose data.

In all these things, paying bottom dollar for hardware, forgoing support is a false economy. At scales of 1pb+ (which is about 1/2 a rack now) its much much cheaper to use off the shelf parts with 24/7 support than "softwareing" your way out.


> Depends on the dick

not really, sorry I had to

back to the topic, HDFS is really somewhat waste of disk space, especially when used for something like munching logs

> At scales of 1pb+ (which is about 1/2 a rack now) its much much cheaper to use off the shelf parts with 24/7 support than "softwareing" your way out.

depends, if you need monthly reports from logs, as long as you don't loose storage completely, then using even second hand hardware or decommissioned from prod is cheapest choice


Ahem

Disk....


If you want disk parallelism, RAID 0 is probably easier than Hadoop.


That would depend on the data set and the strip size. Striping is good for streaming. Linear/concat of 2+ drives with XFS would be faster with a lot of files than end up in separate AG's on separate drives which can be accessed in parallel.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: