Worst is Factorio, which is down -47% against the 5800X3D (a previous-generation CPU that also uses stacked L3 cache), on the benchmark "10K trains". Wonder what's going on there?
- "We would have expected higher performance in our Factorio benchmark, as the Ryzen 7 5800X3D and its 3D V-Cache did yield some impressive gains. This is likely due to the AMD PPM Provisioning and 3D V-Cache driver opting for frequency over cache when running this benchmark."
When they manually forced it onto the CCD with 3D-Vcache, they saw a 100% speedup:
"As we can see in our Factorio benchmark, we saw massive gains of over 100% when forcing the Ryzen 9 7950X3D to use the CCD with the 3D V-Cache as opposed to letting AMD's PPM Provisioning and 3D V-Cache Optimizer drivers do their thing automatically."
An OS can easily run an app on each for a few milliseconds, look at the performance counters to see which ends up with more instructions executed, and then move the task to where it performs best.
From time to time, it can retry the test, because software frequently changes what it spends time doing.
But apps like factorio are extremely dynamic, so thats either going to be flakey or its going to spend a lot of time/energy bouncing the process between cores.
I wouldn't be too worried about this personally, as configuring the few really sensitive apps with ProcessLasso or Aniancy is pretty easy.
I don't think the average Factorio gamer is going to know that they should do this or how. They will just know that their new CPU which was supposed to have better gaming performance is doing very, very poorly.
I'm pretty sure this is how Nvidia's game ready drivers work. They come with a bunch of pre-made profiles for popular video games, so the optimal settings are configured based on what applications are being executed, which is also why you have to let it scan for your games, so it knows what executables to watch for, so it can properly select the right profile to use.
My understanding is there's a certain amount of nvidia cleaning up after the game developers too. e.g. replacing poor performing shaders with optimised versions or just old-MS style "let's change how the driver works to keep the game working". Which is why they've been so resistant to opening the driver, they believe some of their perf advantage/secret sauce is in the driver rather than the hardware.
Given the primary selling point of these CPUs vs their non-X3D version is having more cache for memory access heavy workloads, if this is not a problem for your workload you're likely not in a position to gain much from the X3D variant anyway.
Right, so the 7800X3D (due in April), the old 5800X3D, or any future threadripper or Epycs with the X3D.
Best L3 per core outside of the X3Ds it the 7600 or or 7900 CPUs that come with the full size L3 (32mb PER ccd) but only 6 cores per CCD (instead of 8).
Depends on good scheduling choices since some threads benefit from the higher clocked CCD and others benefit from the higher cache CCD. Sounds like AMD is using a lookup table style scheduler on windows. I believe the system mostly schedules the higher clock CCD by default.
Similar problem to big.little but less differentiation between the cores. It's big.big. Kind of a mess outside of the primary use case that is games on windows with a static tuned scheduler.
I didn't read the article but according to Gamers Nexus latest video you REALLY need the latest BIOS and Chipset drivers for this part because it is what tells Windows which CCD the game threads should be parked on.
It's the CPU scheduler. You can easily avoid this by just not buying an X3D variant. Recent Intel CPUs can have the same problem where processes get assigned to the efficiency cores instead of performance cores if the right scheduler isn't used (like by using Windows 10 or an old Linux distro without explicit support)
Or buy an X3D that's homogeneous, like the 5800X3D or upcoming 7800X3D.
I have no idea why AMD made the 7950X3D be this weird 50/50 split. Why not make both CCDs have 3D V-cache and make it the true halo part it should be? Then cut costs with the 7900X3D?
The added cache isn't free; the CCD with vcache runs at a significantly lower clock speed. If every task was always put on the most appropriate CCD, the heterogeneous approach is pretty much a direct advantage, as frequency-sensitive workloads get high frequencies and cache-sensitive workloads get lots of cache.
Whether it will ever get good enough at picking CCD remains to be seen though.
Perhaps but I have no idea how any scheduler can possibly figure out if a workload is frequency or cache sensitive. The current strategy seems to be "if game, use cache, else use frequency" and it sorta works OK-ish? Although then there's games that aren't detected right (like Factorio) and it's then a terrible result.
But if you look at single threaded results with it set to "prefer cache" it's not really that much slower. Yes you'd have a flagship with lower single-threaded Cinebench results, but I mean that also happens with pretty much every HEDT platform and nobody bats an eye at that. The results are still a significant step beyond the 5000 series single core results.
There was some outcry about 5800X3D having lower clocks. This was there "we solved the clock speed problem with a v-cache SKU" move, which was promised somewhere... And I guess, if you use games as a primary load and they fix their scheduler, then it seems to work for both cases.
But I don't like this franken chip. I would much rather 7800X3D. But then, I don't play games, so the scheduler would just be terrible.
Because the added 3D V-cache increases heat. That's also why the X3D variants have lower clocks than the non-X3D versions. Two CCDs with V-cache is apparently too hot.
That doesn't pass a sniff test at all. The 7950X3D TDP is 120W vs. the 7950X's 170W, even ignoring that modern TDP is a dumb metric actual power consumption also backs that up - the 7950X3D uses signiifcantly less power than a 7950X. And, as an expected result of that, the 7950X3D also runs cooler as a result than the 7950X does. There's no evidence to support a thermal or power limit here.
And it wasn't that different for the 5800X vs. 5800X3D, either. The extra cache at the same TDP seems to translate to around a 200mhz clock speed loss, temperatures otherwise the ~same.
Cache is also not very power hungry in general. Perhaps you meant it was a thermal barrier but again see above - all evidence points to AMD having figured that part out, there's no hint of thermal constraints on either the previous nor current 3d v-cache parts.
Previously that wasn't the issue, the issue (according to AMD) was that there was a huge interchip communication bottleneck with the extra L3 on both chips.
> Critically, AMD has opted to equip all of their new V-cache EPYC chips with the maximum 768 MB of L3 cache, which in turn means all 8 CCDs must be present, from the top SKU (EPYC 7773X) to the bottom SKU (EPYC 7373X). Instead, AMD will be varying the number of CPU cores enabled in each CCD.
(unff 768MB of cache)
afaik AMD has said the voltage is one limit because they share a voltage rail, so, you can't go outside the operating limits of the v-cache die. But thermals are another because the cache die does limit thermal transfer out of the CCD as well. They did shave it down (to maintain package height) which helps somewhat but I think it does run at least a bit hotter due to the separation between dies. But voltage is, afaik, the primary limitation, and I've never heard of cache coherency as being the problem.
I don't think they're lying about the cache die hurting clocks, that's obviously true from 5800X vs 5800X3D and lower clocks hurts performance in some applications, so they're legitimately trying to offer the best all-around-performer they can as a general-purpose CPU. But I also think they're being very careful about how much they let Ryzen cannibalize their Epyc X3D lineup too, and doing their best to salami-slice this into multiple releases.
There's no reason they couldn't have done 5900X3D last gen, and I bet eventually we will see a dual-cache-die release too. There is (based on Epyc) pretty obviously no technical limitation here, they just don't want to sell it to you right now. You can make excuses like "oh the market just isn't there" but... I think they will do it eventually. For now they are just using it to distinguish between Ryzen and Epyc, previously it was if you wanted >8 cores, now it is 12C and 16C but only one cache die, in a gen or three they will throw in the towel and let you buy all-cache-die configs if that's what you want.
They very specifically said it was an interconnect issue. With the extra L3, adjacent dies made so many requests to each other's cache that it turned into a bottleneck, and this wasnt an issue on Milan X because its IO die is apparently beefier.
I cant find the source because I am on mobile and now google search is flooded with X3D chatter :/
In this case it more depends on what Intel does. If Intel somehow beats this before Zen 5 is ready I don't see why not. AMD has these all cores with v-cache parts available on its EPYC lineup so it is technically possible. (not yet with Zen 4 but Zen 3 parts do exist)
This is a very short-sighted approach. MCM is coming to Intel too. Moreover, hybrid architectures are coming to AMD with Zen 5. I've historically argued that Zen 4 X3D is a bit of a "beta test" prior to Zen 5, where scheduler improvements are going to be far more impactful when AMD moves to a hybrid architecture like Intel did with their 12th gen. Zen 4 X3D is the time to learn things and fix initial mistakes, before Zen 5 is much more of a prime time, especially in the data center.
This is going to require scheduler improvements not only from the microprocessor manufacturers, but the Windows, BSD and Linux kernels. There is going to be non-stop improvements to maximize the architectures.
Heck, Linux has an entirely new scheduler (https://www.phoronix.com/news/Nest-Linux-Scheduling-Warm-Cor...) being proposed that not only improves performance, but power efficiency, simply by keeping "hot cores hot", because the impact is far more relevant today than the last time the kernel's scheduler was seriously revisited. It's not like multi-core architecture is new either, it's just that the core count increase is going up so fast the impact keeps getting larger.
-- -----
Although I'd generally agree with a sentiment "don't bet on future things you don't have today" the reality is software is always catching up to hardware. Moreover, for major changes to CPU design, they don't YOLO out perfection in one pass -- it takes a while, by many large groups, over years, to eke every scrap of performance out of it.
NAND is a great example of this. The vast increase in IO has hamstrung a lot of stuff. Linux is straight up slamming into IO scheduler limits (NT kernel even more so) at a level that can be achieved by normal human beings, not big companies. That isn't because NAND is bad or flawed. It's because a near future where that kind of performance was possible wasn't considered in the IO scheduler's design.
In the case of Factorio, I'd bet good money that of ALL the game devs out there, the Factorio devs will be jumping all over this one to ensure it's resolved quickly.
Is there a point to going beyond 60 updates/s in Factorio? Because if not the system seems to be doing the correct thing and using less power. I hope AMD isn’t causing higher power usage for all users just to game some benchmarks!
Modern CPU and scheduler combos will in fact try to do that or, at least, do something that has that as a second order effect. They’ll try to run the CPU at a speed that evens out regular bursts into steady loads, on the lowest power core on which that’s possible to do.
Edit: although, for that to work for this workload Factorio itself would need to be updating at a steady 60 updates/s, which I guess is not actually the case here…
I mean, sure, if Factorio was running at 60 stably the CPU would downclock until it was just about keeping up. My point is more that artificially running Factorio at unlocked tickrates (above 60) makes sense for a benchmark, because if it can hit 115 in that map that tells you it'll be able to keep 60 in a map that's twice as big. Ie. I'm saying it's not like the CPU is using a different mode below 60 and above 60 as long as the tickrate is unlocked; there's nothing special about 60, it just happens that's what Factorio is normally limited to.
the Linux test leaves me quiet confused how to pick my next linux gaming cpu.
On the one hand side Gamers Nexus is making a point [0] (there are on/off performance charts later in the video) why it is super important to install the latest windows drivers + special windows gaming software to make use of the heterogeneous architecture.
On the other hand side the linux tests are much better then I would expected since it can not know out of the box what core to pick. Maybe the 50/50 pick is good enough for how the charts look...
It's not quite the same out of the box experience (after making sure you have the latest chipset drivers, use the balanced power plan and NOT the performance plan, and have Xbox Game Bar..), but it's possible to force certain processes and executables to run on specific CPU cores in Linux using taskset and verify that they're running on the correct cores. It will be a little tedious if you play a lot of games on Steam as you'll inevitably run into games that use proton and others that use native binaries, but there's nothing that should prevent you from being able to manually schedule games/programs that need the extra cache onto the cores that have the extra cache.
What about a VM with GPU pass-trough? Let's say I have plenty of storage and just create different VMs with selected distro or winXI that have certain games. Could that also be an option? Just drag and drop the appropriate game in the VM storage?
Yes, for GPU passthrough you usually assign specific cores to the VM and isolate them (to reduce hiccups caused by other tasks using the core), so in this case you'd just assign the cores with the extra cache.
TBH if you are CPU bottlenecked in your linux games (hence you need a new CPU), you are probably playing simulation games, and these tests are kinda useless anyway.
I wish they'd included the results of the undervolted 7950x in the graphs (aka ECO mode)
It's sad that performance benchmarks have made market conditions where the gain of 5% performance is somehow worth 20%-30% extra power consumption (and the noise that follows from that)
The X3D chips do at least have a reduced TDP of 120W vs 170W (except for the 7800X3D vs the confusingly comparable 7700X's 105W). The other thing is that, at least in DIY PC (and botique builders like CyberPower) the "gaming" motherboards universally increase the maximum peak power target and also commonly enable things like all core turbo and precision boost overdrive by default. So unless they specify otherwise, like AnandTech I think, you can assume most reviewers will be running the CPU with more power than the spec but that's a common situation for their audience.
> The other thing is that, at least in DIY PC (and botique builders like CyberPower) the "gaming" motherboards universally increase the maximum peak power target
Worth noting the PPT is higher than the TDP by default - a "120W" AMD chip is actually allowed to boost to 168W continuously, obviously they prefer to advertise the former and have people making comparisons between "120W" AMD and Intel chips running at full boost PPT.
X3D chips are a cut above the rest of the market in terms of efficiency (although this mostly springs from the clock restrictions imposed by the v-cache CCD, which is undesirable and will be removed/mitigated once the technology permits) but in the rest of the lineup this slides them much closer to the Intel line than AMD would prefer people to think about. A 170W TDP 7950X is actually 220W, a 125W 7800X is actually 170W. People reflexively think in terms of AMD's marketing TDPs when making comparisons and don't account for the fact that the chips are allowed by the spec to boost way above that.
And it's been incredibly successful - every single AMD thread there's someone talking about their nice efficient 65W processor and lol unless you turned off the boost (which is separate from just enabling eco!) then no, it's not 65W. Still good, still usually more efficient than Intel, but AMD gave themselves an instant 30% edge on paper with the power of clever marketing and that's always been a significant amount, enough to swing purchases.
That part is not something that gaming motherboards do automatically, and I've never seen one that will enable multicore enhancement-style tuning by default. If you enable overclocking yourself (and enabling XMP/DOCP is considered overclocking since this usually increases VSOC or VCCSA to allow the memory controller to run faster than spec) then all bets are off but by default no, they shouldn't be, just the terminology difference that springs from AMD's marketing trickery around TDP vs PPT. And hey if people end up blaming partners for AMD's spec... that's just good marketing work.
PBO is the setting where you'd see power limits increased and I've never seen that on by default, it's always hidden behind a "THIS VOIDS YOUR WARRANTY AND KILLS YOUR DOG" disclaimer in a separate panel.
I just turned on Eco Mode last night and was pleasantly surprised. In a non rigorous test on my 5600x, Eco Mode had zero change to single core performance, unchanged CPU temp. Multicore saw a 3% performance drop, but temps were ~12C cooler.
Test yourself, but I am definitely going to leave it enabled.
a 5600x runs at only 65w or so and can be cooled down to 40C like all the time.
I'm not sure if I would prefer to run it at 16w lower, because I bareley even reach the tdp or even tjmax.
> To make sure gaming workloads find the right CCD, AMD has implemented a high degree of software-level control, in the form of its 3D Vertical Cache Optimizer Driver, which is included with the latest version of AMD Chipset Software. This driver ensures that workload from games are directed to the CCD with the 3D Vertical Cache using dynamic "preferred cores" flagging for the Windows OS scheduler.
Yeah, what about games on Linux? I'm pretty sure it will be scheduled all over the place.
Also I doubt that scheduler is smart enough to detect it dynamically. Do they just use a list of known applications for that?
I was also really curious about Linux benchmarks; and seems like Phoronix, a Linux-centric tech review outlet, also released their own reviews with tons of game and productivity benchmarks running on the latest kernel. https://www.phoronix.com/review/amd-ryzen9-7950x3d-linux
Thanks, just saw it. Looks like it fares well in tested games without special scheduler, which is interesting. It falls behind 7950X in video encoding.
Generally "game detection" is done by finding apps that use full-screen exclusive presentation. Could also look for windows that are doing the "borderless windowed" thing. That should be pretty simple and accurate on its own.
They probably do use a list of known applications + some heuristics around graphics APIs? "List of known games" is pretty much what their graphics driver team handles as their bread and butter, so doesn't seem like a significant stretch.
In my experience you can tune the i9-13900K way down on power and barely notice or not at all. I wouldn’t be surprised if these benchmarks came out the same way on the Intel side with power caps set at 125W.
I feel like I must be asking a dumb question, because it is a pretty obvious one. But would it be better to just buy a lower-performing, lower-cost chip in that case? Does the silicon lottery they results in an i9 also result in better performance per watt?
Yeah, it definitely depends on your use case. In my opinion the only reason to get the K or KS SKU is that you want, and are sensitive to, the extra speed bins you get on the 2 magic P-cores. The KS is a full 10% faster on single-threaded, compute-bound workloads. That's like importing the next generation CPU from the future and it is worth it for some use cases. Even with 1 core running at 6GHz you won't blow out a 125W power cap.
I haven't done this recently, and it was in a laptop CPU, but in the past I was able to get a combination of lower power/temps and better performance by undervolting an Intel CPU. Presumably something similar can be done with current-gen desktop CPUs, although perhaps they're running closer to the limits these days.
> Where AMD X3D is taking the wrecking ball to Intel is power consumption. Our simulated Ryzen 7 7800X3D is one of the most energy-efficient CPUs we ever tested. In gaming it consumes 44 W on average, while the competition is wasting a ton of energy to achieve the same FPS: 13900K (143 W), 13700K (107 W), 13600K (89 W)—all more than twice the power usage than AMD's new gaming gem.
Wow. I don't needed such CPU now, but if I needed a low-noise/silent but performant PC I would definitely look at this one.
Yeah, I’m pretty happy with my older and slower 3600, but after the preview I thought this might be an interesting upgrade in the future after prices for AM5 boards, DDR5, and the CPUs fall.
The one caveat here is that actual 7800X3Ds might not be binned as well as the top end model, so I'd wait for real 7800X3D reviews if you're thinking about pre-ordering one based on this experiment.
Based on tech powerup tests the cores with the v-cache do not clock that high. The advertised 700MHz higher clockrate is for the non v-cache cores. I highly doubt AMD would release a part that does not hit the advertised numbers (which on the 7800X3D apply to the v-cache cores as that is all it has)
> A 700 MHz frequency difference may sound like lot first, but as the data in our full Ryzen 9 7950X3D review shows today, the frequencies significantly above 5 GHz are only reachable on the second CCD, the one that does not have the cache, so it can run higher voltages, which lets it achieve those clock speeds.
I'm not a fan of how they park the slower CCD, since it means that the scheduler will try to force all processes onto the same CCD (regardless of whether it's for the game), instead of offloading the non-game workloads onto the other CCD. This is an issue if you run background tasks (such as streaming) or have multiple monitors setup for other things like browser etc.
I think I already have similar troubles with that about Radeon gpu. Their driver (or window11 itself) directs all GPU resource into the foreground app aggressively and basically stop the background app from using it. But I am trying to recording the game using obs. This basically make obs unusable if the gpu is near 100% usage. And sometimes it went even wrong, It decides to direct all gpu resource to the browser on my second screen, and I got 30fps on the game. I usually need a reboot to make it work correctly again.
I'd like them just add a toggle to let me select who use the gpu first or evenly instead. And their guesses never went right.
In addition, different game threads for the same game (or even functions within the game threads!) may benefit differently. Physics calculations may benefit from cache in a different way from other game calculations. Even if you zoom into the different type of physics calculations, there are ballistics, soft-body physics, hair, clothing, foliage, water, etc.
It would be interesting to see what kind of performance could be achieved with PERFECT scheduling. That would probably require the CPU to do process scheduling via specialized circuitry though as opposed to (or in conjunction with) the OS.
Let's be real, this just means now web developers can ship their own browser written in electron with the javascript interpreter written in nodejs, and that is literally what they will do with this.
You joke, but I can imagine a scenario where someone ports a complex GUI app to the web using something like Blazor and accidentally ends up with a full web browser running in WASM, on top of an actual browser.
I can also foresee an entire industry popping up to do this on purpose in the name of “compatibility”. I’m sure the marketing will talk about the efficiency of being able to run “true” IE6 on top of Chrome in 2030.
It’s also just a matter of time until we see Docker images for running Windows 95 apps in the browser via WINE on top of a user-mode Linux kernel emulator.
Check back here in 7 years to see which of the above prophecies materialised.
It doesn’t feel that long ago, that computers would have hundreds of megabytes of RAM. I remember it — my back is a little sore, but I’ve still got all my hair.
Yep, crazy. One of the early web dev shops I worked for had an NT server with 64 megs of RAM. They also had a Sun box with 128 megs. This was around 1995 - 96.
Software was much less bloated then. Of course, it does a lot more now (but probably not 1000x more... as I write this from a desktop with 64 gigs of RAM.)
Lots of software? Like linux, webbrowsers, etc? Arguably all software could run in 128 megs, especially if RAM was "swap" and people stopped obsessing over their javascript UIs.
It still seems a bit far off now, but come on, in 5 years or less we'll see 1GB, that gigabyte level is a huge implicit marketing number.
Might be a harbinger of a new movement in computing: all-cache computing, and reworking OSs and slimming down libraries and other things to fit into cache. It might be the only shot at getting a slimdown revolution going before moore's law really stalls in a couple decades.
It would be interesting to see if an early version of Linux could run entirely out of cache. My first Linux box had 3 megs of RAM (1 on the motherboard, 2 on an expansion card.)
I wish the 64 core Threadrippers were still available for a decent price for HEDT, it's all Epyc or Threadripper Pro which are much more expensive for minimal increase in power over the 3990x.
Cache size is constrained by different things at different levels.
Cache uses 6 transistors per bit while normal RAM uses 1 transistor and 1 capacitor. You can buy a LOT more RAM per dollar than you could buy cache. This has been further exacerbated by poor SRAM scaling with each new fabrication node. It's usually 20-30% instead of 60-80% for other transistors and is actually 0% for TSMC's upcoming N3 node. AMD chose to fab their RDNA 3 GPU die on 5nm, but their SRAM dies on 6nm because the cost savings were much bigger than the effect of a slightly larger die.
DRAM typically has a maximum cycle speed of around 300-500MHz due to the capacitors having more restrictive charge/discharge rates (when you see something like 6400MT/s, that's not random access speed, but is instead the aggregate speed of slowly reading thousands of cells in parallel at one time into a cache then quickly sending the results over the wire).
Once you move away from bulk storage into high-performance caches, you have to discuss speed, latency, and associativity. As cache gets bigger, keeping lookup times low requires more and more transistors to control the cache and keep it coherent with both the CPU/threads and main memory. If you want to make that controller smaller, it will mean lowering clockspeeds, altering latencies, changing associativity, etc. At a physical level, faster caches will require higher-performance transistors which require more die space and power per transistor too.
Sometimes, the tradeoffs aren't what you might expect. A great example is AMD moving from 64kb of I-cache in Zen1 to 32kb I-cache in Zen 2, 3, and 4 because they found the tighter latencies were more important in their uarch than the higher hit rate.
There's a further constraint for AMD's 3D chip where the cache reduces heat transfer forcing lower CPU clockspeeds. Unfortunately there's not much room on the substrate to add the cache beside the CPU. Infinity Cache on RDNA3 also uses a significant portion of it's die space for the super-fast interconnect with the GPU die. I suspect a similar thing happens here too.
AFAIK, power and speed. There's an interesting paper on this topic: [Cache Design Trade-offs for Power and Performance Optimization:
A Case Study](http://iacoma.cs.uiuc.edu/CS497/LP5a.pdf).
suprised there are three answers that dont mention area. the larger a die is the more expensive it is to produce, since the chance of getting a defect goes up. wafer-scale integration works by providing sufficient redundancy and/or routing to avoid damaged areas, but that too has a cost.
Die space. There's a maximum reticle size. You can't make chips bigger than that without using multi-chip techniques like wafer scale integration, chiplets and so on.
If you do a full reticle chip and use most of it for memory you can get about 1GB of SRAM. But that would also be extremely expensive.
It's not quite as bad as you might think (though it is bad).
TSMC N5 is 0.021um per SRAM cell in high-density configuration (still going to be 5-10x faster switching than DRAM). That's about 47.62 bits/um. With 1,000,000 um2 per mm2. EUV reticle limit is 858mm2.
That amounts to 40,857,142,857 bits or about 4.756GB. That's almost exactly 250 dies per wafer. TSMC N5 wafers cost $17,000 or roughly $68 per chip (you can get very close to 100% yields on this kind of very basic chip). If we use part of the chip for internal controllers and interconnects, it would be about $70 per 2-4GB.
The highest-clocked DDR5 RAM I could find with minimal searching was $360 for 32GB.
The SRAM version would cost $560 at the cheapest for just the 32GB RAM not counting the RAM controller chips and all the packaging costs. With profits and everything included, your final price tag would likely be around $750 if you could produce in bulk and $1000 if you could not.
On the plus side, worrying about stuff like CAS latency would basically be a thing of the past with the bottleneck once again becoming the speed over the wire. Power consumption would also be lower without the need to constantly refresh. I doubt that anyone wants to pay that much money for disproportionally marginal increase in performance, but looking at all the people forking over thousands to scalpers for GPUs, maybe I'm wrong.
You might want to check your maths on that. It's more like 30 maximum reticle size dies per wafer IIRC. Just the area based limit is pi * 150^2 / 858 = 82, but you obviously lose a load because dies are rectangular and wafers are circular.
SRAM doesn't need to be refreshed constantly like DRAM, so SRAM will use LESS power when not switching. We think it's different because something like L1 cache is CONSTANTLY switching. It should be noted that the power cost per GB still isn't worse than DRAM, but there's way more data being transferred.
Once you reach L3/L4 cache, it's not switching nearly as often and the actual total power becomes way less than sending data over from RAM. This is one reason why AMD (and now Nvidia) saved so much power with Infinity Cache on RDNA 2.
The big issue with SRAM is that it requires 6 transistors per bit while DRAM requires 1 bit plus 1 capacitor, so capacity per mm2 of die area is much worse. That capacitor keeps thing cheap, but is also the reason why individual DRAM cells can only operate at 300-500MHz while SRAM can easily operate into the many GHz range.
Interesting. Why with SRAM is switching so much more power consuming than just maintaining? I thought with a flip flop every gate is used on pretty much every cycle while it feeds back on itself.
I would have thought reading/writing SRAM is much more expensive than idling it not because of the cost of toggling the flip-flop, but all the cache coherency stuff associated with making a change in multicore.
AMD has a cpu extension (something called SME) that make vm ram data never appears as cleartext in actual ram. It is done by encrypt the ram data before going out from cpu using a key that is only available to cpu itself. But I remembered that it is a ryzen pro only feature.
The TPM is good for storing keys, but can't actually decrypt/encrypt large amounts of data fast as required for disk encryption. So the key ends up temporarily stored in RAM.
The fact that most people leave their computers in sleep mode means that to a determined attacker, disk encryption is bypass able by reading the key from RAM.
I think the current sleep implementation in Windows flushes the key from memory before entering sleep, and re-fetches it on-wake. That's one reason for those "BitLocker PC doesn't wake from sleep" bugs in quirky setups.
P.S: the table in TFA lists the 7700 as having the same 105W as the 7700X. I think that's a slight mistake and the 7700 is 65W max listed TDP instead. FWIW I'm considering a 7700 for I'm not ultra happy with my 3700X (one of the reason being it has no iGPU and I could find a passive GPU with a NVidia chipset, forcing me to use proprietary drivers in Linux and it's a major PITA for if I hack on ther kernel I then need to apply the NVidia patches every time)
Yes [1]! IIRC the iGPU is located in the I/O die [2] which is pretty much the same in all their consumer SKUs (so I don't expect any changes in either memory controller or the iGPU for their X3D models).
Yes, it has an iGPU like all 7xxx desktop Ryzen CPUs. But the GPU has only 2 RDNA2 CUs compared to 96 RDNA3 CUs in Radeon RX7900XTX, or 80 RDNA2 CUs in RX6900XT. For example, mobile Ryzen 6800H has 12 RDNA2 CUs. So it is fine for desktop use like the Intel (non XE) iGPUs.
I think I'll just get rid of it (give it to the wife or something): there are several things I don't like about it. No USB-C port (sure I could add an adapter), only 1 Gb/s ethernet on the mobo, etc. I'm not even sure I can just put a 5700G in it without having the flashing mobo.
I'll probably build something around an Asus B650-Plus mobo: it's got USB-C stock, 2.5 Gbit/s ethernet, DDR5 (I know it's pricier and boots slower, but it's faster too), etc.
I'll reuse one of the NVMe .2 SSD I've got laying around so the whole build should set me back 1 K EUR (with 32 GB of DDR5), which is quite okay.
I vaguely remember that the iGPUs in the 7000 series had some trouble with Linux support. May be wrong, but do look into that. Might need a bleeding-edge kernel and mesa.
You could have at least chosen a set of filters restricted to memory that's actually compatible with the CPUs under discussion: requiring DDR5, and only supporting unbuffered DIMMS. The correct filters are: https://pcpartpicker.com/products/memory/#E=1&b=ddr5
2 x 32GB ECC UDIMM DDR5-4800 RAM is going around for ~280$ here (including a %20 VAT, so I'd assume it would be a little bit cheaper in the states). Not really unreasonable considering regular DIMMs (albeit with higher speeds like 6000) going for 400$ for the same configuration.
Not merely the modules but chipset and firmware support are required as well. Sometimes they are even incompatible so certification comes up. Put together a few cheap things and the price jumps to business levels.
The inflation has really hit the consumer market... 700$ for CPU and over thousand for GPU... Memory is not cheap either...
Not that it won't sell, but still these price points for something still called consumer start to be high. You could get a whole portable PC for same money.
I was actually wondering what the original Extreme Edition P4s retailed for a couple of decades ago, and it turns out to be $999. Plus ça change, plus c’est la même chose.
> With the cuts, the wholesale cost of a 733-MHz Pentium III drops 3 percent, from $776 to $754, while the 700-MHz version drops 3 percent from $754 to $733 in volume.
Indeed. People forget that flagship Intel Extreme Edition or AMD Athlon FX chips from ~17 years ago were retailing for over 1000$ in the money of that time. In today's roubles they would be around 1500$ adjusted for inflation. Ouch!
Today 750$ for a flagship CPU that buys you the best possible compute in the consumer space is not a bad deal at all. It's never been a better time for X86 CPU shopping. You most likely don't need to buy the most expensive flagship CPU. There's great options for every budget.
The GPU market on the other hand is completely fucked. In 2007 you could get a 8800 GTS for 360$ (520$ today) giving you near flagship performance.
I feel like AMD and Intel are competing in the CPU space while AMD and Nvidia entered in a gentlemen's agreement to control the GPU market.
Yeah, the "top of the line" processors were often insanely priced (and some were not available for love or money, basically paper launches only available to those with connections).
https://www.redhill.net.au/iu.php is a good read down memory lane about those things, even as far back as the 286 you had it. Sometimes very rarely and almost as a mistake the "best bang for the buck" processor AND the "fastest in the world" would overlap.
I paid a bit more than that for my current 5950X a couple years ago now. Top of the stack CPUs have always had a pretty big price jump. Though, I would class the 7950X(3D) as similar to HEDT even if it's the top of the consumer models... if you don't need a ton of i/o, there's a lot of value there depending on what you're doing.
GPU pricing seems a bit more insane, given a lot of the demand has actually died down at this point, and the mfgs are definitely restricting production to keep prices (and margins) up. It's kind of sad there really isn't anything comparatively good on the lower pricing end ($100-300) where historically you'd see around 60-70% the performance well under half the price.
$700 topish tier CPUs are nothing new. The Pentium 4 Extreme Edition had a $1000 MSRP for instance. Even in 2004 you could get an entire laptop for the money, even an Apple laptop with the base model iBook G4 selling for $1000. There is still good CPUs in the $100-$300 range.
GPU prices have gone insane though with mid-range having somehow moved up to $400-$500 where high-end was a decade ago.
The GPU market is definitely suffering from having only one dominant player. I'm hoping that AMD is able to invest some of the extra cash flow since the introduction of Ryzen toward making their GPUs fully competitive with NVIDIA rather than having to play catch up all the time. Also, NVIDIA kind of has to price gouge on this generation because they made so much money during COVID and the crypto boom. Unless they want to be like Zoom or Peloton and lose 75+% of their market cap, they have to keep the profit curve from plunging. And they know there's latent demand from gamers who were priced out of the market before, plus pretty strong interest from the AI side, so why not keep the prices high to maximize profit?
I also think another factor in the rise of GPU prices is how quickly things have been moving on the display front. Not only did we go more or less directly from 1080p to 4K, thereby quadrupling resolution, now we're not shooting for 30 or even 60 fps minimums anymore. We want 1% lows to be above 120, 144, or even 240fps so that our high refresh rate monitors are properly fed. Oh, and we don't want to turn down image quality while doing it. Now throw in VR, where high-end headsets can be 4K or more per eye, and you've got a lot of pixels to push. So top-end GPUs have to be these behemoths with several kg of cooling hanging off of them, because they are now the biggest power consumer in the system. Yet they still have to fit into the real estate allotted to just a few peripheral slots, usually in an orientation that's not optimal for airflow. It's quite the engineering challenge, and I kind of see why GPUs at every level have gotten so expensive, since they're all priced relative to the top range.
This is true for the high-end, but at the same time, for about 180€ you can get a last gen 5700G which has nice performance and can play most games. Or you get the normal but more powerful CPU for that price and combine it with an ARC GPU. It's not as cheap as it once was, but it's not that bad.
>>You could get a whole portable PC for same money.
Not with the same performance you cant...
Not even close
>>The inflation has really hit the consumer market... 700$ for CPU and over thousand for GPU...
$700 for the Top of the line CPU is actually low, at the highest of intel monopoly period they were charging well in excess of $1000 (closer to $1700 if I remember) for their I7 K line enthusiast PC's.. Even today $700 is cheap compared to the Threadripper line AMD seems to have given up on.
The challenge is compounded by the fact that you need to buy a whole host of new parts at the same time. A new motherboard to handle AM5, DDR5 memory, and most likely a new PSU given how power hungry CPUs and graphics cards have become.
The way I understand it JEDEC had standardized KB/MB/GB being multiple-of-2-based in memory specs prior to the IEC standardising KiB/MiB/GiB for the same. As a result it kind of just stuck in the space since the idea of multiple-of-10 cache is a bit silly anyways so there is no risk for confusion.
Because there is no other interpretation of M that makes sense (neither megabits or base 10 megabytes) you can shorten it further without loss of info, so naturally people have since why not. A similar story can be said of network interface speeds like "10G" which makes its way even to standards names like "10GBASE-SR".
Have you noticed how AMD and Intel just do tiny steps, sometimes even not worth writing about when it comes to innovation?
It's always either AMD matching the new Intel CPU of other way around - it's never about eg. completely crushing Intel.
This is the problem of late stage capitalism, where both companies are owned by the same large investment funds and they use their power to ensure neither company could eat too much profits of another.
When AMD was falling behind, they seem to have been "allowed" to do the Ryzen and it's back again to "elephant race".
I mean...it's not in anyone's best interests for the only two companies in a given market to completely crush the other, right? It would be bad for consumers (increase in prices), bad for B2B (increase in prices), bad for workers (wage suppression), bad for for the market (monopolies eventually focus more of their efforts on rent-seeking than what made them profitable in the first place).
That is only your assumption. How is Apple crushing the laptop market bad for the consumers?
Currently there is a paradox, that if you want to buy high quality and performant laptop and affordable laptop, Apple is very much the only option. No competition can come close. Sort of they got caught with their pants down as they stopped innovating just like AMD and Intel.
https://www.anandtech.com/show/18747/the-amd-ryzen-9-7950x3d... ("CPU Benchmark Performance: Simulation")
Worst is Factorio, which is down -47% against the 5800X3D (a previous-generation CPU that also uses stacked L3 cache), on the benchmark "10K trains". Wonder what's going on there?
- "We would have expected higher performance in our Factorio benchmark, as the Ryzen 7 5800X3D and its 3D V-Cache did yield some impressive gains. This is likely due to the AMD PPM Provisioning and 3D V-Cache driver opting for frequency over cache when running this benchmark."