With a replication factor of 2 (for fault tolerance), it's ~4.5 TB. FoundationDB...

jedberg · on Dec 11, 2014

Why do you use RAID on your nodes if you have an RF==2?

jrallison · on Dec 11, 2014

A few reasons off the top of my head:

1) We're still interested in the nodes being as reliable as they can be. With RAID 5, we need two simultaneous disk failures to brick a node. With RAID 0 (to increase usable disk space), any of the 3 disks can brick the node.

Even with 12 nodes and RF of 2, an order of magnitude more node failures would be more likely to disrupt our service. Perhaps this makes more sense in a larger cluster with a higher RF factor?

2) We're using commodity hardware from a dedicated server provider, which uses RAID 1 or 5 configurations exclusively by default. We haven't reached the point where we felt that was enough of an issue or gain to investigate changing it.

3) Having more CPUs/disks in the cluster (12 nodes rather than cramming all data on 6) can be a good thing... as FDB scales fairly linearly.

jedberg · on Dec 11, 2014

I think you misinterpreted what I said. I explained it more clearly below. I suggested using the same 12 nodes but putting each one as a RAID 0, which would get you more reliability and more storage for the same cost. In your current config, two dead disks possibly bricks the system -- in the config I propose, you'd need four dead disks before anyone noticed.

What I'm suggesting is that you think of the cluster more holistically, since I assume your goal is a reliable cluster, not reliable nodes. As a nice bonus you get more "free" disk space.

jrallison · on Dec 11, 2014

That was my interpretation of your comment, but I'm still not sure I follow. In my understanding, by using RAID 0, any single disk failure will brick a node. Each node would then have 3 disks that are ticking time bombs (multiplying the failure rate by 3). How is that more reliable?

In RAID 5, I can have 1 disk failure on a node with no problem. 2 disk failures on the same node, and I only lose 1 node of my 12 node cluster (I.E. cluster is fine). I can also theoretically lose 12 (1 on each node) + 2*(RF-1) disks, and gracefully repair the situation with 0 interruption.

What's the benefit of RAID 0 other than increased usable disk space and perhaps write performance? It seems you're decreasing reliability significantly for those gains.

tinco · on Dec 11, 2014

Perhaps to be able to recover without stressing other nodes? If a disk fails, your reads suddenly all go to 1 node in the replicaset. If then that same node also has to supply the data for the fresh harddrive, it might interfere with the read-performance and/or take a long time to restore full redundancy.

Also, in theory the raid 5 configuration would have faster reads.

jedberg · on Dec 11, 2014

But then you might as well just use an RF of 3. You get all the benefits you listed above, plus more storage (5.7TB vs 4.5TB), and less configuration hassle. And greater horizontal scalability.

And a RAID 5 will never be faster than a RAID 0 or a JBOD. :)

ansible · on Dec 11, 2014

FDB has a substantial per-node license cost. It makes sense to beef up the hardware on individual nodes as much as possible first, then scale out to more nodes.

https://foundationdb.com/pricing

jedberg · on Dec 11, 2014

It would be unfortunate if their pricing model drove poor architectural decisions, but in this case that doesn't apply.

I suggested getting more storage by using the same number of nodes differently.

ansible · on Dec 11, 2014

Ah, I mis-read your post, and was thinking you were suggesting more nodes rather than more disks. Sorry for that.

At any rate, I have quite a distaste for RAID-5, so I agree that it would be preferable to go with the higher RF instead.

Even more preferable to me would be to not use RAID-0, if FDB can just use multiple drive partitions as storage directly.

jedberg · on Dec 11, 2014

Yeah, JBOD is preferable to RAID if your workload supports it. Less overhead and less things to break.