Benchmarking Intel, AMD and Graviton Using Erasure Coding Workloads

gigatexal · on May 23, 2023

Interesting to see Intel still being rather competitive here. My biases had me going in thinking AMD would walk away with the win.

The conclusion:

"Analysis and Conclusion

With this test we were looking to confirm that Erasure Coding on commodity hardware can be every bit as fast as dedicated hardware - without the cost or lock-in. We are happy to confirm that even running at top-of-class NIC speeds we will only use a minor fraction of CPU resources for erasure coding on all of the most popular platforms.

This means that the CPU can spend its resources on handling IO and other parts of the requests, and we can reasonably expect that any handling of external stream processors would take at least an equivalent amount of resources.

We are happy to see that Intel improved throughput on their latest platform. We look forward to testing the most recent AMD platform, and we expect its AVX512 and GFNI support to provide a further performance boost. Even if Graviton 3 turned out to be a bit behind, we don’t realistically see it becoming a significant bottleneck. For more detailed information about installing, running, and using MinIO in any environment, please refer to our documentation. To learn more about MinIO or get involved in our community, please visit us at min.io or join our public slack channel."

Aissen · on May 23, 2023

> We look forward to testing the most recent AMD platform, and we expect its AVX512 and GFNI support to provide a further performance boost.

It's missing a benchmark of the latest AMD CPUs (Genoa/Epyc 4).

gigatexal · on May 23, 2023

Ahh that's why! Thanks for clarifying.

mtanski · on May 23, 2023

My knowledge here is rather dated, and I'm unfamiliar with the minio code base... But ~ a decade ago Intel invested a fair amount of effort into optimizing the oss Erasure Coding libraries for Intel CPUs.

Back then CPUs had less cores, RS coding was relatively more expensive and certainly CPU bound on then new NVMe flash devices

It's possible, even likely this is the result of that

gigatexal · on May 23, 2023

Each of the CPUs tested are from various generations and have different TDPs. It's hard to benchmark 2-cpus if they're not in the same class TDP wise or from the same generation etc.

zokier · on May 23, 2023

Its really weird how poorly EPYC fares in the reconstruction benchmarks, what's going on there?

dijit · on May 23, 2023

I thought it was spelled out quite clearly in TFA: "GFNI".

GFNI Was only added to AMD CPUs in Zen4, basically the "just released" server CPUs (codename: Genoa) which are currently quite difficult to procure, even 6 months after announcement.

zokier · on May 23, 2023

I don't see how that explains the encoding:reconstruction perf relationship. Looking at the first pair (12+4 encoding), epyc is about twice as fast as avx2-intel and four times as fast as graviton at encoding. But in reconstruction its two to three times slower than either.

re-thc · on May 23, 2023

Genoa’s been delayed due to memory controller issues.

rektide · on May 23, 2023

Even running AVX2, Intel is walloping the 48-core Epyc, on decode specifically.

supriyo-biswas · on May 23, 2023

This isn’t really relevant to the article, but are there any file systems which can store files using erasure coding? (As opposed to using something like MinIO)

voxadam · on May 23, 2023

It's not mainline yet but bcachefs[0][1][2] includes erasure coding as one of its defining features.

Maybe, if we're lucky, Kent Overstreet (koverstreet[3]) will magically appear and talk more on his latest creation.

[0] https://bcachefs.org/

[1] https://bcachefs.org/bcachefs-principles-of-operation.pdf

[2] https://www.patreon.com/bcachefs (de facto bcachefs development blog)

[3] https://news.ycombinator.com/user?id=koverstreet

tecleandor · on May 23, 2023

Oh, interesting! I still haven't played with bcachefs, so I didn't know you can do erasure coding replicas there. Although they says the implementation is still simple so it suffers from "write hole".

voxadam · on May 23, 2023

> Although they says the implementation is still simple so it suffers from "write hole".

That's what I was hoping Kent would show up and expand on. His March 29 update[0] says that erasure coding "has seen a lot of progress" which makes me hopeful.

[0] https://www.patreon.com/posts/your-irregular-80720060

gigatexal · on May 23, 2023

I am very stoked for bcachefs to make it into the kernel.

voxadam · on May 23, 2023

While not a traditional file system, Tahoe-LAFS[0] uses erasure coding in its decentralized cloud store.

[0] https://tahoe-lafs.org/trac/tahoe-lafs

naranha · on May 23, 2023

- ZFS with data loss protection

- Ceph with data loss protection

- Any filesystem on linux-md without data loss protection, BTRFS on linux-md on synology

tecleandor · on May 23, 2023

I'm not sure what are you referring to with "data loss protection" in ZFS, but if you refer to the 'copies' parameter, that's not erasure coding, that's just full copies of the data blocks [0]. They mostly only work for some sector errors on disk. If you lose a whole disk, THEY DO NOTHING. [1]

"The block pointer can store up to three copies of the data each pointed by a unique DVA. These blocks are referred to as “ditto” blocks in ZFS terminology." [2]

And although VMWare calls RAID5/6 "Erasure Coding" [3](!!), I'd say it's not [4].

  0: https://docs.oracle.com/en/operating-systems/solaris/oracle-solaris/11.4/manage-zfs/copies-property.html
  1: https://jrs-s.net/2016/05/02/zfs-copies-equals-n/
  2: https://pages.cs.wisc.edu/~kadav/zfs/zfsrel.pdf
  3: https://docs.vmware.com/en/VMware-vSphere/6.7/com.vmware.vsphere.virtualsan.doc/GUID-AD408FA8-5898-4541-9F82-FE72E6CD6227.html
  4: https://www.itprotoday.com/storage/erasure-coding-vs-raid-which-right-and-when

--

Edit: some extra info and links

matja · on May 23, 2023

> For double or triple parity, we use a special case of Reed-Solomon coding

https://github.com/openzfs/zfs/blob/zfs-2.1-release/module/z...

naranha · on May 23, 2023

Sorry I meant to write protection against data corruption (checksums)

tecleandor · on May 23, 2023

Note that ZFS (and probably other fs) checksums are not data corruption protection, but data corruption detection ;)

If a checksum error is detected, you depend (mostly) on your RAIDZ protection to recover from it [0]. If you're on a single drive or striped array without mirrors or raidz1/2/3... you're SOL. (ditto blocks can help but they aren't the best. I'd only recommend them if you're stuck on a single drive setup without raid)

But you at least know that your data is corrupt, and that's very important too :)

  0: https://openzfs.github.io/openzfs-docs/Basic%20Concepts/Checksums.html

gigatexal · on May 23, 2023

https://en.wikipedia.org/wiki/RozoFS <-- never heard of it before I googled. But apparently this.

I wonder if Brtfs is using this for raid. Though last I read it Raid-5 was still broken.

ddorian43 · on May 23, 2023

Raid 5 & 6 are erasure coding as an example.

tecleandor · on May 23, 2023

See my other comment but RAID 5/6 are not usually considered erasure coding implementations [0]. Mustly, because AFAIK RAID 5 and 6 aren't standards, so it depends on the controller or software manufacturer implementation. Some use XOR, some use Reed Solomon, some use other stuff...

For example, Linux MD RAID 5 uses XOR, and RAID 6 uses Galois fields. [1][2]

  0: https://www.itprotoday.com/storage/erasure-coding-vs-raid-which-right-and-when
  1: https://raid.wiki.kernel.org/index.php/A_guide_to_mdadm#Raid_5
  2: http://igoro.com/archive/how-raid-6-dual-parity-calculation-works/

mjb · on May 23, 2023

Of course RAID 5 and 6 are erasure coding. They use a code (e.g. XOR/parity) to tolerate the behavior of a binary erasure channel (BEC). Systematic codes (where some of the code stripes are just identity chunks of the data), like RAID 5 and 6, are probably the most common case of erasure codes.

Now, we could have an argument about whether RAID 1 is an erasure code, but that wouldn't teach us much.

tecleandor · on May 23, 2023

Well, I stand corrected (and I don't have much knowledge to argue about this).

Re-reading, I see that in the storage world "simple" correction systems don't tend to be classified as erasure coding.

Were I say "simple", I think I'd say any correction system that's not parametrizable, as in "let's do a 17+3 in this ordinary dataset and 4+2 in this one that's very important"

About RAID 1, technically probably it is... although bit very efficient :D

throwawaymaths · on May 23, 2023

Technically xor is a special case of erasure coding.

davedx · on May 23, 2023

Tl;dr: CPU isn’t a bottleneck for content delivery services. Not really surprising

ddorian43 · on May 23, 2023

See Netflix optimizing BSD for delivering video https://news.ycombinator.com/item?id=33449297