With a replication factor of 2 (for fault tolerance), it's ~4.5 TB.
FoundationDB requires SSD drives, which the best we can get efficiently in our data center is ~670 GB of usable space (3x480GB raided).
12x670 = 8040 GB
We try to keep extra space available for node failures (FDB will immediately start replicating addition data if it notices data with less than the configured number of replicas).
Our dataset currently grows at a decent pace, so we over provision a bit as well.
1) We're still interested in the nodes being as reliable as they can be. With RAID 5, we need two simultaneous disk failures to brick a node. With RAID 0 (to increase usable disk space), any of the 3 disks can brick the node.
Even with 12 nodes and RF of 2, an order of magnitude more node failures would be more likely to disrupt our service. Perhaps this makes more sense in a larger cluster with a higher RF factor?
2) We're using commodity hardware from a dedicated server provider, which uses RAID 1 or 5 configurations exclusively by default. We haven't reached the point where we felt that was enough of an issue or gain to investigate changing it.
3) Having more CPUs/disks in the cluster (12 nodes rather than cramming all data on 6) can be a good thing... as FDB scales fairly linearly.
I think you misinterpreted what I said. I explained it more clearly below. I suggested using the same 12 nodes but putting each one as a RAID 0, which would get you more reliability and more storage for the same cost. In your current config, two dead disks possibly bricks the system -- in the config I propose, you'd need four dead disks before anyone noticed.
What I'm suggesting is that you think of the cluster more holistically, since I assume your goal is a reliable cluster, not reliable nodes. As a nice bonus you get more "free" disk space.
That was my interpretation of your comment, but I'm still not sure I follow. In my understanding, by using RAID 0, any single disk failure will brick a node. Each node would then have 3 disks that are ticking time bombs (multiplying the failure rate by 3). How is that more reliable?
In RAID 5, I can have 1 disk failure on a node with no problem. 2 disk failures on the same node, and I only lose 1 node of my 12 node cluster (I.E. cluster is fine). I can also theoretically lose 12 (1 on each node) + 2*(RF-1) disks, and gracefully repair the situation with 0 interruption.
What's the benefit of RAID 0 other than increased usable disk space and perhaps write performance? It seems you're decreasing reliability significantly for those gains.
Perhaps to be able to recover without stressing other nodes? If a disk fails, your reads suddenly all go to 1 node in the replicaset. If then that same node also has to supply the data for the fresh harddrive, it might interfere with the read-performance and/or take a long time to restore full redundancy.
Also, in theory the raid 5 configuration would have faster reads.
But then you might as well just use an RF of 3. You get all the benefits you listed above, plus more storage (5.7TB vs 4.5TB), and less configuration hassle. And greater horizontal scalability.
And a RAID 5 will never be faster than a RAID 0 or a JBOD. :)
FDB has a substantial per-node license cost. It makes sense to beef up the hardware on individual nodes as much as possible first, then scale out to more nodes.
FoundationDB requires SSD drives, which the best we can get efficiently in our data center is ~670 GB of usable space (3x480GB raided).
12x670 = 8040 GB
We try to keep extra space available for node failures (FDB will immediately start replicating addition data if it notices data with less than the configured number of replicas).
Our dataset currently grows at a decent pace, so we over provision a bit as well.