Hacker News new | past | comments | ask | show | jobs | submit login

Let me suggest the Quantcast File System (QFS) [0]. It's much closer to GFS as described in the paper (crucially it uses Reed-Solomon encoding to reduce storage requirements), it's highly tunable to different workloads, and it's written in C++. Quantcast uses it to store petabytes of data and for running map reduce. Unfortunately it hasn't seen much uptake outside of Quantcast, despite being a clear improvement over HDFS.

[0] http://quantcast.github.io/qfs/

(Disclaimer: I used to work for Quantcast).




Reed-Solomon for forward error correction, to provide redundancy? But isn't Reed-Solomon really geared towards single-bit errors, while in the real world our storage tends to fail with multiple missing blocks?

I thought erasure codes were a much better approach.


It's not used for tolerating disk errors (typically in a production context you have RAID for that, and failures tend to be for the entire disk). It's used to reduce storage requirements via striping. See the QFS paper (http://db.disi.unitn.eu/pages/VLDBProgram/pdf/industry/p808-...) for a good description of how this works. The basic idea is that with RS you can get 3x replication by splitting the data into 6 pieces stored on different servers, plus three parity blocks. This requires 1.5x storage rather than 3x while still tolerating the loss of any three machines of the nine.


>[T]ypically in a production context you have RAID for that, and failures tend to be for the entire disk[.]

Does QFS run in addition to other file systems on the storage nodes or does it manage disks directly? You see, I was thinking that maybe ZFS + QFS might be a good idea and would like to know if it's possible. Also, is QFS available for FreeBSD and/or SmartOS storage nodes and clients? How about CoreOS and Debian, are storage nodes and clients available for those?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: