It's the CPU scheduler. You can easily avoid this by just not buying an X3D variant. Recent Intel CPUs can have the same problem where processes get assigned to the efficiency cores instead of performance cores if the right scheduler isn't used (like by using Windows 10 or an old Linux distro without explicit support)
Or buy an X3D that's homogeneous, like the 5800X3D or upcoming 7800X3D.
I have no idea why AMD made the 7950X3D be this weird 50/50 split. Why not make both CCDs have 3D V-cache and make it the true halo part it should be? Then cut costs with the 7900X3D?
The added cache isn't free; the CCD with vcache runs at a significantly lower clock speed. If every task was always put on the most appropriate CCD, the heterogeneous approach is pretty much a direct advantage, as frequency-sensitive workloads get high frequencies and cache-sensitive workloads get lots of cache.
Whether it will ever get good enough at picking CCD remains to be seen though.
Perhaps but I have no idea how any scheduler can possibly figure out if a workload is frequency or cache sensitive. The current strategy seems to be "if game, use cache, else use frequency" and it sorta works OK-ish? Although then there's games that aren't detected right (like Factorio) and it's then a terrible result.
But if you look at single threaded results with it set to "prefer cache" it's not really that much slower. Yes you'd have a flagship with lower single-threaded Cinebench results, but I mean that also happens with pretty much every HEDT platform and nobody bats an eye at that. The results are still a significant step beyond the 5000 series single core results.
There was some outcry about 5800X3D having lower clocks. This was there "we solved the clock speed problem with a v-cache SKU" move, which was promised somewhere... And I guess, if you use games as a primary load and they fix their scheduler, then it seems to work for both cases.
But I don't like this franken chip. I would much rather 7800X3D. But then, I don't play games, so the scheduler would just be terrible.
Because the added 3D V-cache increases heat. That's also why the X3D variants have lower clocks than the non-X3D versions. Two CCDs with V-cache is apparently too hot.
That doesn't pass a sniff test at all. The 7950X3D TDP is 120W vs. the 7950X's 170W, even ignoring that modern TDP is a dumb metric actual power consumption also backs that up - the 7950X3D uses signiifcantly less power than a 7950X. And, as an expected result of that, the 7950X3D also runs cooler as a result than the 7950X does. There's no evidence to support a thermal or power limit here.
And it wasn't that different for the 5800X vs. 5800X3D, either. The extra cache at the same TDP seems to translate to around a 200mhz clock speed loss, temperatures otherwise the ~same.
Cache is also not very power hungry in general. Perhaps you meant it was a thermal barrier but again see above - all evidence points to AMD having figured that part out, there's no hint of thermal constraints on either the previous nor current 3d v-cache parts.
Previously that wasn't the issue, the issue (according to AMD) was that there was a huge interchip communication bottleneck with the extra L3 on both chips.
> Critically, AMD has opted to equip all of their new V-cache EPYC chips with the maximum 768 MB of L3 cache, which in turn means all 8 CCDs must be present, from the top SKU (EPYC 7773X) to the bottom SKU (EPYC 7373X). Instead, AMD will be varying the number of CPU cores enabled in each CCD.
(unff 768MB of cache)
afaik AMD has said the voltage is one limit because they share a voltage rail, so, you can't go outside the operating limits of the v-cache die. But thermals are another because the cache die does limit thermal transfer out of the CCD as well. They did shave it down (to maintain package height) which helps somewhat but I think it does run at least a bit hotter due to the separation between dies. But voltage is, afaik, the primary limitation, and I've never heard of cache coherency as being the problem.
I don't think they're lying about the cache die hurting clocks, that's obviously true from 5800X vs 5800X3D and lower clocks hurts performance in some applications, so they're legitimately trying to offer the best all-around-performer they can as a general-purpose CPU. But I also think they're being very careful about how much they let Ryzen cannibalize their Epyc X3D lineup too, and doing their best to salami-slice this into multiple releases.
There's no reason they couldn't have done 5900X3D last gen, and I bet eventually we will see a dual-cache-die release too. There is (based on Epyc) pretty obviously no technical limitation here, they just don't want to sell it to you right now. You can make excuses like "oh the market just isn't there" but... I think they will do it eventually. For now they are just using it to distinguish between Ryzen and Epyc, previously it was if you wanted >8 cores, now it is 12C and 16C but only one cache die, in a gen or three they will throw in the towel and let you buy all-cache-die configs if that's what you want.
They very specifically said it was an interconnect issue. With the extra L3, adjacent dies made so many requests to each other's cache that it turned into a bottleneck, and this wasnt an issue on Milan X because its IO die is apparently beefier.
I cant find the source because I am on mobile and now google search is flooded with X3D chatter :/
In this case it more depends on what Intel does. If Intel somehow beats this before Zen 5 is ready I don't see why not. AMD has these all cores with v-cache parts available on its EPYC lineup so it is technically possible. (not yet with Zen 4 but Zen 3 parts do exist)
This is a very short-sighted approach. MCM is coming to Intel too. Moreover, hybrid architectures are coming to AMD with Zen 5. I've historically argued that Zen 4 X3D is a bit of a "beta test" prior to Zen 5, where scheduler improvements are going to be far more impactful when AMD moves to a hybrid architecture like Intel did with their 12th gen. Zen 4 X3D is the time to learn things and fix initial mistakes, before Zen 5 is much more of a prime time, especially in the data center.
This is going to require scheduler improvements not only from the microprocessor manufacturers, but the Windows, BSD and Linux kernels. There is going to be non-stop improvements to maximize the architectures.
Heck, Linux has an entirely new scheduler (https://www.phoronix.com/news/Nest-Linux-Scheduling-Warm-Cor...) being proposed that not only improves performance, but power efficiency, simply by keeping "hot cores hot", because the impact is far more relevant today than the last time the kernel's scheduler was seriously revisited. It's not like multi-core architecture is new either, it's just that the core count increase is going up so fast the impact keeps getting larger.
-- -----
Although I'd generally agree with a sentiment "don't bet on future things you don't have today" the reality is software is always catching up to hardware. Moreover, for major changes to CPU design, they don't YOLO out perfection in one pass -- it takes a while, by many large groups, over years, to eke every scrap of performance out of it.
NAND is a great example of this. The vast increase in IO has hamstrung a lot of stuff. Linux is straight up slamming into IO scheduler limits (NT kernel even more so) at a level that can be achieved by normal human beings, not big companies. That isn't because NAND is bad or flawed. It's because a near future where that kind of performance was possible wasn't considered in the IO scheduler's design.
In the case of Factorio, I'd bet good money that of ALL the game devs out there, the Factorio devs will be jumping all over this one to ensure it's resolved quickly.
Is there a point to going beyond 60 updates/s in Factorio? Because if not the system seems to be doing the correct thing and using less power. I hope AMD isn’t causing higher power usage for all users just to game some benchmarks!
Modern CPU and scheduler combos will in fact try to do that or, at least, do something that has that as a second order effect. They’ll try to run the CPU at a speed that evens out regular bursts into steady loads, on the lowest power core on which that’s possible to do.
Edit: although, for that to work for this workload Factorio itself would need to be updating at a steady 60 updates/s, which I guess is not actually the case here…
I mean, sure, if Factorio was running at 60 stably the CPU would downclock until it was just about keeping up. My point is more that artificially running Factorio at unlocked tickrates (above 60) makes sense for a benchmark, because if it can hit 115 in that map that tells you it'll be able to keep 60 in a map that's twice as big. Ie. I'm saying it's not like the CPU is using a different mode below 60 and above 60 as long as the tickrate is unlocked; there's nothing special about 60, it just happens that's what Factorio is normally limited to.