Hacker News new | past | comments | ask | show | jobs | submit login
Benchmarking Intel, AMD and Graviton Using Erasure Coding Workloads (min.io)
53 points by edogrider on May 23, 2023 | hide | past | favorite | 29 comments



Interesting to see Intel still being rather competitive here. My biases had me going in thinking AMD would walk away with the win.

The conclusion:

"Analysis and Conclusion

With this test we were looking to confirm that Erasure Coding on commodity hardware can be every bit as fast as dedicated hardware - without the cost or lock-in. We are happy to confirm that even running at top-of-class NIC speeds we will only use a minor fraction of CPU resources for erasure coding on all of the most popular platforms.

This means that the CPU can spend its resources on handling IO and other parts of the requests, and we can reasonably expect that any handling of external stream processors would take at least an equivalent amount of resources.

We are happy to see that Intel improved throughput on their latest platform. We look forward to testing the most recent AMD platform, and we expect its AVX512 and GFNI support to provide a further performance boost. Even if Graviton 3 turned out to be a bit behind, we don’t realistically see it becoming a significant bottleneck. For more detailed information about installing, running, and using MinIO in any environment, please refer to our documentation. To learn more about MinIO or get involved in our community, please visit us at min.io or join our public slack channel."


> We look forward to testing the most recent AMD platform, and we expect its AVX512 and GFNI support to provide a further performance boost.

It's missing a benchmark of the latest AMD CPUs (Genoa/Epyc 4).


Ahh that's why! Thanks for clarifying.


My knowledge here is rather dated, and I'm unfamiliar with the minio code base... But ~ a decade ago Intel invested a fair amount of effort into optimizing the oss Erasure Coding libraries for Intel CPUs.

Back then CPUs had less cores, RS coding was relatively more expensive and certainly CPU bound on then new NVMe flash devices

It's possible, even likely this is the result of that


Each of the CPUs tested are from various generations and have different TDPs. It's hard to benchmark 2-cpus if they're not in the same class TDP wise or from the same generation etc.


Its really weird how poorly EPYC fares in the reconstruction benchmarks, what's going on there?


I thought it was spelled out quite clearly in TFA: "GFNI".

GFNI Was only added to AMD CPUs in Zen4, basically the "just released" server CPUs (codename: Genoa) which are currently quite difficult to procure, even 6 months after announcement.


I don't see how that explains the encoding:reconstruction perf relationship. Looking at the first pair (12+4 encoding), epyc is about twice as fast as avx2-intel and four times as fast as graviton at encoding. But in reconstruction its two to three times slower than either.


Genoa’s been delayed due to memory controller issues.


Even running AVX2, Intel is walloping the 48-core Epyc, on decode specifically.


This isn’t really relevant to the article, but are there any file systems which can store files using erasure coding? (As opposed to using something like MinIO)


It's not mainline yet but bcachefs[0][1][2] includes erasure coding as one of its defining features.

Maybe, if we're lucky, Kent Overstreet (koverstreet[3]) will magically appear and talk more on his latest creation.

[0] https://bcachefs.org/

[1] https://bcachefs.org/bcachefs-principles-of-operation.pdf

[2] https://www.patreon.com/bcachefs (de facto bcachefs development blog)

[3] https://news.ycombinator.com/user?id=koverstreet


Oh, interesting! I still haven't played with bcachefs, so I didn't know you can do erasure coding replicas there. Although they says the implementation is still simple so it suffers from "write hole".


> Although they says the implementation is still simple so it suffers from "write hole".

That's what I was hoping Kent would show up and expand on. His March 29 update[0] says that erasure coding "has seen a lot of progress" which makes me hopeful.

[0] https://www.patreon.com/posts/your-irregular-80720060


I am very stoked for bcachefs to make it into the kernel.


While not a traditional file system, Tahoe-LAFS[0] uses erasure coding in its decentralized cloud store.

[0] https://tahoe-lafs.org/trac/tahoe-lafs


- ZFS with data loss protection

- Ceph with data loss protection

- Any filesystem on linux-md without data loss protection, BTRFS on linux-md on synology


I'm not sure what are you referring to with "data loss protection" in ZFS, but if you refer to the 'copies' parameter, that's not erasure coding, that's just full copies of the data blocks [0]. They mostly only work for some sector errors on disk. If you lose a whole disk, THEY DO NOTHING. [1]

"The block pointer can store up to three copies of the data each pointed by a unique DVA. These blocks are referred to as “ditto” blocks in ZFS terminology." [2]

And although VMWare calls RAID5/6 "Erasure Coding" [3](!!), I'd say it's not [4].

  0: https://docs.oracle.com/en/operating-systems/solaris/oracle-solaris/11.4/manage-zfs/copies-property.html
  1: https://jrs-s.net/2016/05/02/zfs-copies-equals-n/
  2: https://pages.cs.wisc.edu/~kadav/zfs/zfsrel.pdf
  3: https://docs.vmware.com/en/VMware-vSphere/6.7/com.vmware.vsphere.virtualsan.doc/GUID-AD408FA8-5898-4541-9F82-FE72E6CD6227.html
  4: https://www.itprotoday.com/storage/erasure-coding-vs-raid-which-right-and-when
--

Edit: some extra info and links


> For double or triple parity, we use a special case of Reed-Solomon coding

https://github.com/openzfs/zfs/blob/zfs-2.1-release/module/z...


Sorry I meant to write protection against data corruption (checksums)


Note that ZFS (and probably other fs) checksums are not data corruption protection, but data corruption detection ;)

If a checksum error is detected, you depend (mostly) on your RAIDZ protection to recover from it [0]. If you're on a single drive or striped array without mirrors or raidz1/2/3... you're SOL. (ditto blocks can help but they aren't the best. I'd only recommend them if you're stuck on a single drive setup without raid)

But you at least know that your data is corrupt, and that's very important too :)

  0: https://openzfs.github.io/openzfs-docs/Basic%20Concepts/Checksums.html


https://en.wikipedia.org/wiki/RozoFS <-- never heard of it before I googled. But apparently this.

I wonder if Brtfs is using this for raid. Though last I read it Raid-5 was still broken.


Raid 5 & 6 are erasure coding as an example.


See my other comment but RAID 5/6 are not usually considered erasure coding implementations [0]. Mustly, because AFAIK RAID 5 and 6 aren't standards, so it depends on the controller or software manufacturer implementation. Some use XOR, some use Reed Solomon, some use other stuff...

For example, Linux MD RAID 5 uses XOR, and RAID 6 uses Galois fields. [1][2]

  0: https://www.itprotoday.com/storage/erasure-coding-vs-raid-which-right-and-when
  1: https://raid.wiki.kernel.org/index.php/A_guide_to_mdadm#Raid_5
  2: http://igoro.com/archive/how-raid-6-dual-parity-calculation-works/


Of course RAID 5 and 6 are erasure coding. They use a code (e.g. XOR/parity) to tolerate the behavior of a binary erasure channel (BEC). Systematic codes (where some of the code stripes are just identity chunks of the data), like RAID 5 and 6, are probably the most common case of erasure codes.

Now, we could have an argument about whether RAID 1 is an erasure code, but that wouldn't teach us much.


Well, I stand corrected (and I don't have much knowledge to argue about this).

Re-reading, I see that in the storage world "simple" correction systems don't tend to be classified as erasure coding.

Were I say "simple", I think I'd say any correction system that's not parametrizable, as in "let's do a 17+3 in this ordinary dataset and 4+2 in this one that's very important"

About RAID 1, technically probably it is... although bit very efficient :D


Technically xor is a special case of erasure coding.


Tl;dr: CPU isn’t a bottleneck for content delivery services. Not really surprising


See Netflix optimizing BSD for delivering video https://news.ycombinator.com/item?id=33449297




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: