You sorted by single core performance, then compared multi core performance. Sort by multi core performance, and you will see that the i9-11900K is nowhere near the top spot.
For example, the Ryzen 9 5950X has single/multi core scores of 1,688/16,645 - which is higher in multi core score than the M1 Max, but lower in the single core.
Interestingly, the iPhone's A15 SOC did get a newer version of Apple's big core this year.
>On an adjacent note, with a score of 7.28 in the integer suite, Apple’s A15 P-core is on equal footing with AMD’s Zen3-based Ryzen 5950X with a score of 7.29, and ahead of M1 with a score of 6.66.
Which is still not that much higher. Of the "consumer" CPUs only 5900X and 5950X score higher. And their stress power draw is about 2X of speculated M1 Max's.
That's maybe not a bad way to sort? Most of the time I'm interacting with a computer I'm waiting for some single thread to respond, so I want to maximize that, then look over a column to see if it will be adequate for bulk compute tasks as well.
Perhaps they were referencing the highest 8C chip. Certainly, a 5950X is faster, but it also has double the number of cores (counting only performance on the M1; I don't know if the 2 efficiency cores do anything on the multi-core benchmark). Not to mention the power consumption differences - one is in a laptop and the other is a desktop CPU.
Looking at a 1783/12693 on an 8-core CPU shows about a 10% scaling penalty from 1 to 8 cores - suppose a 32-core M1 came out for the Mac Pro that could scale only at 50% per core, that would still score over 28000, compared to the real-world top scorer, the 64-core 3990X scoring 25271.
The A15 efficiency cores will be in the next model. They are A76-level performance (flagship-level for Android from 2019-2020), but use only a tiny bit more power than the current efficiency cores.
At that point, their E-cores will have something like 80% the performance of a Zen 1 core. Zen 1 might not be the new hotness, but lots of people are perfectly fine with their Threadripper 1950X which Apple could almost match with 16 E-cores and only around 8 watts of peak power.
I suspect we'll see Apple joining ARM in three-tiered CPUs shortly. Adding a couple in-order cores just for tiny system processes that wake periodically, but don't actually do much just makes a ton of sense.
The single core is second to Intel's best but the multicore is well below in the scale, comparable to Intel Xeon W-2191B or Intel Core i9-10920X, which are 18 and 12 core beasts with TDP of up to 165W.
Which means, at least for Geekbench, Apple M1 Max has a power comparable to a very powerful desktop workstation. But if you need the absolute best of the best on multicore you can get double the performance with AMD Ryzen Threadripper 3990X at 280W TDP!
Can you imagine if Apple released some beast with similar TDP? 300W Apple M1 Unleashed, the trashcan design re-imagined, with 10X power of M1 Max if can preserve similar performance per watt. That would be 5X over the best of the best.
If Apple made an iMac Pro with similar TDP to the Intel one, and keeps the performance per watt, that would mean multicore score of about 60K, which is twice of the best processor there is in the X86 World.
I suspect, these scores don't tell the full story since the Apple SoC has specialised units for processing certain kind of data and they have direct access to the data in the memory and as a result it could be unmatched by anything but at the same time it can be comically slow for some other type of processes where X86 shines.
John Siracusa had a diagram linked here that shows the die for M1 Max, and says the ultimate desktop version is basically 4 M1 Max packages. If true, that’s a 40 core CPU 128 core GPU beast, and then we can compare to the desktop 280W Ryzens.
Interestingly, the M1 Max is only a 10 core (of which only 8 are high performance). I wonder what it will look like when it’s a 20-core, or even a 64-core like the Threadripper. Imagine a 64-core M1 on an iMac or Mac Pro.
Bloomberg's Gurman certainly has shown that he has reliable sources inside Apple over the years.
>Codenamed Jade 2C-Die and Jade 4C-Die, a redesigned Mac Pro is planned to come in 20 or 40 computing core variations, made up of 16 high-performance or 32 high-performance cores and four or eight high-efficiency cores. The chips would also include either 64 core or 128 core options for graphics.
P.S.
General rant: WTF Intel. I'm really glad there is a decoder ring but does it really have to be that hard? Is there really a need for 14 suffixes? For example, option T, power-optimized lifestyle. Is it really different from option U, mobile power efficient?
Per core performance is the most interesting metric.
Edit: for relative comparison between CPUs, per core metric is the most interesting unless you also account for heat, price and many other factors. Comparing a 56-core CPU with 10-core M1 is a meaningless comparison.
Or run heavy renders of complex ray-traced scenes.
Or do heavy 3D reconstruction from 2D images.
Or run Monte-Carlo simulations to compute complex likelihoods on parametric trading models.
Or train ML models.
The list of things you can do with a computer with many, many cores is long, and some of these (or parts thereof) are sometimes rather annoying to map to a GPU.
While working in rust I am most limited by single core performance. Incremental builds at the moment are like, 300ms compiling and 2 seconds linking. In release mode linking takes 10+ seconds with LTO turned on. The linker is entirely single threaded.
Fast cold compiles are nice, but I do that far more rarely than incremental debug builds. And there’s faster linkers (like mold[1] or lld) but lld doesn’t support macos properly and mold doesn’t support macos at all.
I’m pretty sure tsc and most javascript bundlers are also single threaded.
I wish software people cared anywhere near as much about performance as hardware engineers do. Until then, single core performance numbers will continue to matter for me.
My project [0] is about 600k lines of C++. It takes about 5m40s to build from scratch on a Ryzen Threadripper 2950X, using all 16 cores more or less maxed out. There's no option in C++ for meaningiful incremental compiles. Typically working compiles (i.e. just what is needed given whatever I've just done) are on the order of 5-45 secs, but I've noticed that knowing I can do a full rebuild in a few minutes affects my development decisions in a very positive way. I do 99% of my development work on Linux, even though the program is cross-platform, and so I get to benefit from lld(1).
The same machine does nychthemeral builds that include macOS compiles on a QEMU VM, but given that I'm asleep when that happens, I only care that the night's work is done before I get up.
Compilers typically don't use multiple cores, but the build system that invokes them do, by invoking them in parallel. Modern build systems will typically invoke commands for 1 target per core, which means that on my system for example, building my software uses all 16 cores more or less until the final steps of the process.
The speed record for building my software is held by a system with over 1k cores (a couple of seconds, compared to multiple minutes on a mid-size Threadripper).
Just running the tests in our Rails project (11k of them) can stress out a ton; we're regularly running it on 80+ cores to keep our test completion time ~3 minutes. M1 Max should let me run all tests locally much faster than I can today.
10,997 for Intel i9
vs
12,693 for M1 Max