At SpiderOak we get a 3x replication equivalent for about 35% overhead, using Re...

cperciva · on Jan 16, 2011

3x replication equivalent for about 35% overhead

How did you compute this "replication equivalent"?

Retric · on Jan 16, 2011

Picked a number from thin air? Raid six requires 2 drives for back and is normally used in set's of 8 or 16 drives but looks like they are using 45 drives. So 45/43 = 4.65% overhead from using RAID.

Not if they lose 35% on top of that they are around 41% overhead. But, they are taking a huge it on write speeds, network traffic and reliability for doing so.

Edit: Looks like they have 10,058 TB before partitioning the drives so my guess is ~3-6TB of actual user data.

schumihan · on Jan 16, 2011

Does SpiderOak only provide backup service? Erasure encoding is efficient for cold data. Do you use erasure encoding to distribute the hot data across clusters?