Interesting to see Intel still being rather competitive here. My biases had me going in thinking AMD would walk away with the win.
The conclusion:
"Analysis and Conclusion
With this test we were looking to confirm that Erasure Coding on commodity hardware can be every bit as fast as dedicated hardware - without the cost or lock-in. We are happy to confirm that even running at top-of-class NIC speeds we will only use a minor fraction of CPU resources for erasure coding on all of the most popular platforms.
This means that the CPU can spend its resources on handling IO and other parts of the requests, and we can reasonably expect that any handling of external stream processors would take at least an equivalent amount of resources.
We are happy to see that Intel improved throughput on their latest platform. We look forward to testing the most recent AMD platform, and we expect its AVX512 and GFNI support to provide a further performance boost. Even if Graviton 3 turned out to be a bit behind, we don’t realistically see it becoming a significant bottleneck.
For more detailed information about installing, running, and using MinIO in any environment, please refer to our documentation. To learn more about MinIO or get involved in our community, please visit us at min.io or join our public slack channel."
My knowledge here is rather dated, and I'm unfamiliar with the minio code base... But ~ a decade ago Intel invested a fair amount of effort into optimizing the oss Erasure Coding libraries for Intel CPUs.
Back then CPUs had less cores, RS coding was relatively more expensive and certainly CPU bound on then new NVMe flash devices
It's possible, even likely this is the result of that
Each of the CPUs tested are from various generations and have different TDPs. It's hard to benchmark 2-cpus if they're not in the same class TDP wise or from the same generation etc.
I thought it was spelled out quite clearly in TFA: "GFNI".
GFNI Was only added to AMD CPUs in Zen4, basically the "just released" server CPUs (codename: Genoa) which are currently quite difficult to procure, even 6 months after announcement.
I don't see how that explains the encoding:reconstruction perf relationship. Looking at the first pair (12+4 encoding), epyc is about twice as fast as avx2-intel and four times as fast as graviton at encoding. But in reconstruction its two to three times slower than either.
This isn’t really relevant to the article, but are there any file systems which can store files using erasure coding? (As opposed to using something like MinIO)
Oh, interesting! I still haven't played with bcachefs, so I didn't know you can do erasure coding replicas there. Although they says the implementation is still simple so it suffers from "write hole".
> Although they says the implementation is still simple so it suffers from "write hole".
That's what I was hoping Kent would show up and expand on. His March 29 update[0] says that erasure coding "has seen a lot of progress" which makes me hopeful.
I'm not sure what are you referring to with "data loss protection" in ZFS, but if you refer to the 'copies' parameter, that's not erasure coding, that's just full copies of the data blocks [0]. They mostly only work for some sector errors on disk. If you lose a whole disk, THEY DO NOTHING. [1]
"The block pointer can store up to three copies of the data each pointed by a unique DVA. These blocks are referred to as “ditto” blocks in ZFS terminology." [2]
And although VMWare calls RAID5/6 "Erasure Coding" [3](!!), I'd say it's not [4].
Note that ZFS (and probably other fs) checksums are not data corruption protection, but data corruption detection ;)
If a checksum error is detected, you depend (mostly) on your RAIDZ protection to recover from it [0]. If you're on a single drive or striped array without mirrors or raidz1/2/3... you're SOL. (ditto blocks can help but they aren't the best. I'd only recommend them if you're stuck on a single drive setup without raid)
But you at least know that your data is corrupt, and that's very important too :)
See my other comment but RAID 5/6 are not usually considered erasure coding implementations [0]. Mustly, because AFAIK RAID 5 and 6 aren't standards, so it depends on the controller or software manufacturer implementation. Some use XOR, some use Reed Solomon, some use other stuff...
For example, Linux MD RAID 5 uses XOR, and RAID 6 uses Galois fields. [1][2]
Of course RAID 5 and 6 are erasure coding. They use a code (e.g. XOR/parity) to tolerate the behavior of a binary erasure channel (BEC). Systematic codes (where some of the code stripes are just identity chunks of the data), like RAID 5 and 6, are probably the most common case of erasure codes.
Now, we could have an argument about whether RAID 1 is an erasure code, but that wouldn't teach us much.
Well, I stand corrected (and I don't have much knowledge to argue about this).
Re-reading, I see that in the storage world "simple" correction systems don't tend to be classified as erasure coding.
Were I say "simple", I think I'd say any correction system that's not parametrizable, as in "let's do a 17+3 in this ordinary dataset and 4+2 in this one that's very important"
About RAID 1, technically probably it is... although bit very efficient :D
The conclusion:
"Analysis and Conclusion
With this test we were looking to confirm that Erasure Coding on commodity hardware can be every bit as fast as dedicated hardware - without the cost or lock-in. We are happy to confirm that even running at top-of-class NIC speeds we will only use a minor fraction of CPU resources for erasure coding on all of the most popular platforms.
This means that the CPU can spend its resources on handling IO and other parts of the requests, and we can reasonably expect that any handling of external stream processors would take at least an equivalent amount of resources.
We are happy to see that Intel improved throughput on their latest platform. We look forward to testing the most recent AMD platform, and we expect its AVX512 and GFNI support to provide a further performance boost. Even if Graviton 3 turned out to be a bit behind, we don’t realistically see it becoming a significant bottleneck. For more detailed information about installing, running, and using MinIO in any environment, please refer to our documentation. To learn more about MinIO or get involved in our community, please visit us at min.io or join our public slack channel."