Hacker News new | past | comments | ask | show | jobs | submit login

At SpiderOak we get a 3x replication equivalent for about 35% overhead, using Reed-Solomon at the cluster level (on top of RAID6 at the machine level.) Not nearly as expensive as outright replication.

Agree those SATA port multiplies are worrisome. In the beginning, our prototype machines used them to squeeze as many drives into a single machine as possible. They have unusually low tolerance for electrical interference and make it possible for one badly malfunctioning drive to take an entire array offline until manually serviced. We've seen occasions where just touching a cable attached to a port multiplier caused the Linux kernel to emit "dazed and confused" NMI events. I am not brave enough to try them again, even in a redundant setup.




3x replication equivalent for about 35% overhead

How did you compute this "replication equivalent"?


Picked a number from thin air? Raid six requires 2 drives for back and is normally used in set's of 8 or 16 drives but looks like they are using 45 drives. So 45/43 = 4.65% overhead from using RAID.

Not if they lose 35% on top of that they are around 41% overhead. But, they are taking a huge it on write speeds, network traffic and reliability for doing so.

Edit: Looks like they have 10,058 TB before partitioning the drives so my guess is ~3-6TB of actual user data.


Does SpiderOak only provide backup service? Erasure encoding is efficient for cold data. Do you use erasure encoding to distribute the hot data across clusters?




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: