It sounds great when paired with an APU. No such part was announced though.

wmf · on April 23, 2023

The GPU cannot access the L3 though so V-cache would not help APUs. Maybe in the future AMD will have V-Infinity Cache.

brnt · on April 24, 2023

What a pity!

andrewstuart · on April 23, 2023

APUs are very slow compared to discrete GPUs.

wtallis · on April 23, 2023

A lot of that is due to the low memory bandwidth on mainstream CPU sockets: 128-bit bus, going up to around 6GT/s currently. That's half the memory bandwidth of NVIDIA's current entry-level laptop discrete GPU. And more L3 cache on a CPU directly addresses that memory bandwidth limitation (provided it's accessible to the iGPU, which would certainly be the case if AMD put 3D cache on an APU).

AnthonyMouse · on April 23, 2023

All of the Ryzen 7000 series are APUs.

Dalewyn · on April 23, 2023

With AMD finally taking a page from Intel with regards to providing an iGPU to everyone, Intel expected to integrate ARC into their CPUs with Meteor Lake (14th gen), and discrete GPUs from Nvidia and AMD being as bloody expensive as they are, I wonder if we're on the verge of a turning point in desktop computing as discrete GPUs go down the path once trodden by sound cards?

AnthonyMouse · on April 23, 2023

Fairly common for even pedestrian discrete GPUs to use over 200W. The ludicrous ones can break 1000W. In a given generation, GPU performance is not far from linear in power consumption.

The problem is the existing iGPUs are unnecessarily anemic. Okay, the Radeon RX 6950 XT has 80 CU at 335W. You can't fit that in Socket AM5. But the Radeon RX 6400 has 12 CU at 53W. You could add that to a Ryzen 7 7700X (105W) and still have a lower TDP than a Ryzen 9 7900X (170W). You could double that and add it to a Ryzen 7 7700 (65W) and still be under 170W. You could even add that to the 7950X with the caveat that it has the base clock of the 7945HX when the iGPU is under load.

But the Ryzen 7000 series iGPU doesn't have 24 CU, or 12, it has 2. Intel's have been even slower.

The question is, why bother? 2 CU is plenty to run a monitor if you're not doing anything that benefits from a fast GPU. If you are, 12 CU is still on the low side.

But AI could change things, because then even people who don't play games will have a use for that kind of hardware. Professionals will still pay thousands of dollars for a big hot discrete GPU but there is now a case for having ten times as many CU in every business PC.

It's also starting to make more sense to put the CPU and GPU together because of unified memory. The full version of LLaMA requires ~122GB of RAM. That's petty cash in PC memory but a capital expenditure for GPUs. If AMD wanted to make a big splash, they'd drop a new Threadripper with a high TDP socket, a ton of memory channels, support for RDIMMs and a fast iGPU.

This is the thing that may finally unseat Nvidia on AI, because if every CPU from Intel or AMD starts to include an iGPU that can do AI as fast as a $300+ Nvidia discrete GPU, people are going to make the code run on them.

sliken · on April 24, 2023

Keep in mind that PC sockets have very little memory bandwidth. Even the fancy new AM5 socket with DDR5 and the "recommended" overclocking to DDR5-6000 leaves you with 96GB/sec. Even the lowly RX 6400 has a dedicated 128GB/sec, which isn't trying to keep any CPU cores happy. Doubling the CUs requires double the bandwidth that brings you up to 256GB/sec, almost as much as the new Intel Xeons (307GB/sec) with 8 channels of DDR5-4800 memory.

So basically this is why Intel/AMD iGPUs stink, and why Apple's do not. Apple give you a choice of 100, 200, and 400GB/sec in a laptop and 400 or 800GB/sec in a desktop.

AnthonyMouse · on April 26, 2023

DDR5-8000 is already available. DDR5-6000 is 75% as fast, which is not that different. But if you want it you can get the same 128GB/sec from socket AM5 as the RX 6400 right now. Which is the class of GPU they could be integrating, instead of the existing ones which are <17% as fast. It's relatively uncommon for the CPU and GPU to be memory bound on the same workload. Generally it's either one or the other.

And there are GPUs, like the RX 6900 XT, that have more CUs per GB/s than the RX 6400. At the same ratio you could have 20 CU with the bandwidth of dual channel DDR5-8000, which will no doubt become more common and less expensive as socket AM5 matures. So the problem isn't the memory bandwidth, it's that the iGPU could be bigger.

And nothing stops them from making arbitrarily much memory bandwidth available even in existing sockets by including a few GB of HBM on the CPU package. That's essentially what Apple is doing to the exclusion of even having separate memory, but that's not optimal either. Now you can get a Macbook Pro with 96GB of very fast memory (if you want to pay $4000+), but then it can't run the full ~122GB LLaMA model which anyone can do with ~$200 worth of DDR4 in an ordinary desktop PC.

The right answer is to provide a few GB of HBM on the CPU package which can be used as an L4 cache and to provide the bandwidth needed for a fast iGPU, then have memory slots to add an arbitrary amount of ordinary memory so that anything that won't fit there can run at lower speed, instead of not at all. That isn't on offer yet, but it's better, and it's possible, so give it a minute.

sliken · on April 26, 2023

There have been numerous attempts at GPU friendly memory systems for CPUs. Iris Pro had an off chip, but inside package help. I forget if it was cache, ram, or edram. The current apple solution looks like the best I've seen (100,200,400, or 800GB/sec).

Frustratingly the PS5 and XboxX both had pretty nice memory systems for CPU+IGPU, but during the entire GPU shortage AMD never brought it to market for PCs/ laptops. Although there is hope. The AMD Strix Halo, due in 2nd half 2024 does claim to have a 256 bit wide memory bus.

Sure a fancy hybrid that uses a few fast stacks of HBM and some slower dimm slots is possible, but it's far from sure if such a design would hit the size, power, performance, and cost constraints of the market. Clearly the Apple 400GB/sec apple solution fits in a thin/light laptop with pretty good battery life.

AnthonyMouse · on April 27, 2023

What Apple is doing is inflexible and expensive. If you need 128GB of RAM, it isn't available. If you need 32GB of RAM, their least expensive offering appears to be a $1700 Mini. You can get a PC laptop with 32GB for under $500 and a PC desktop with 128GB for the same. You could add a $1000 discrete GPU to that and have money left over.

> Sure a fancy hybrid that uses a few fast stacks of HBM and some slower dimm slots is possible, but it's far from sure if such a design would hit the size, power, performance, and cost constraints of the market.

Slower memory costs less and uses less power than higher bandwidth configurations. Moreover, if you offered various CPUs with e.g. 8-32GB of HBM, systems that needed no more memory than that could leave their DIMM slots empty. But then systems with e.g. 16GB of HBM and 64GB of ordinary RAM would be possible, resulting in 80GB of usable memory the most active 16GB of which has quadruple the bandwidth or more. For a much lower price than 80GB of HBM and similar performance for anything with a <16GB working set.

sliken · on April 27, 2023

> What Apple is doing is inflexible and expensive.

High bandwidth, High capacity, and cheap ... pick 2.

If you need memory bandwidth for more then 64GB Apple is unmatched at anywhere close to the price. Apple has a unique combination of GPU like memory bandwidth, without the GPU like memory limits.

> You could add a $1000 discrete GPU to that and have money left over.

Yes, but you'd also have 1/8th the CPU memory bandwidth and a max GPU ram of 16-20GB.

> [ proposed hybrid HBM + dram]

Possible, but it's a large complex solution that will significantly increase hardware + development costs for a niche market, even if you leave the dimms empty. Trick is will the HBM actually improve performance? Say you have a 128GB LLM, how much will adding 16GB of HBM improve performance? If the fast memory (HBM) is 16GB it's going to be very tempting to just use a GPU.

> 128GB of RAM, it isn't available

M1 studio ultra has a 128GB config for $4,800. It hasn't been refreshed for the M2 yet, but the gen2 is 24GB (m2), 48GB (pro), and 96gb (max). Stands to reason the m2 ultra will have twice the ram of the m2 max, just like the m1 ultra allows double the m1 max. So likely 192GB real soon. Rumors claim a release at WWDC in June, recent OS updates mention 3 new desktop models.

800GB/sec to 192GB ram is quite unique in today's market. There are some similar solutions like the Arm base A64FX which has multiple stacks of HBM memory, but a max ram of 32GB. As you might imagine they aren't cheap either.

dragontamer · on April 23, 2023

DDR5 is something like 10x slower (or 10% of the bandwidth) of GDDR6 or HBM that's used on GPUs.

There's no point going much wider on iGPUs, because they're all throttled by DDR4/DDR5 anyway. The exception: PS5 and XBox both have GDDR (graphics RAM) which is 10x more bandwidth than your typical DDR4/DDR5. Since they are no longer memory-bandwidth constrained, they can go much bigger and remain practical.

AnthonyMouse · on April 24, 2023

> DDR5 is something like 10x slower (or 10% of the bandwidth) of GDDR6 or HBM that's used on GPUs.

DDR5 is something like 50% of the bandwidth of GDDR6. DDR5 up to 8 GT/s, GDDR6 in the range of 14-21 GT/s. For example, the Radeon RX 6400 has GDDR6 at 16 GT/s. But it also has a 64-bit memory bus, compared to 128-bit for Socket AM5. So they have the same memory bandwidth, given the fastest available DDR5.

Higher end GPUs have a wider memory bus than that, but those don't fit in the CPU socket's TDP anyway. It wouldn't prevent the iGPU from having 12+ CU instead of 2.

Moreover, higher end CPUs have a wider memory bus than that too. Socket SP5 has 12 memory channels. And a max TDP of 360W. That's as much memory bandwidth as many midrange discrete GPUs, especially anything that would fit in its TDP with a reasonable power budget left to the CPU.

AnthonyMouse · on April 24, 2023

They could also add some HBM as a chiplet to the CPU package. That would provide enough bandwidth to feed a fast iGPU and give the CPU a very nice L4 cache.

sliken · on April 24, 2023

More like 2x. Apple for instance uses lpddr5 for 100, 200, 400, or 800GB/sec.

It's mostly bus width, not the memory technology that makes a big difference.

ilyt · on April 24, 2023

> With AMD finally taking a page from Intel with regards to providing an iGPU to everyone

Eh, it's wasted silicon for most that will be buying those CPUs.

I'd rather see separate chipset that integrates GPU for those that need it and drop few bucks off the CPU price.

Dalewyn · on April 24, 2023

I've had my bacon saved more than enough times by integrated graphics that I consider it downright heresy to call them a waste.

sampa · on April 23, 2023

look at the size of 4090 and think again

noizejoy · on April 23, 2023

The soundcard analogy may not be that faulty. General purpose soundcards have pretty much been replaced by motherboard components, while high end soundcards (audio interfaces) for DAW (digital audio workstation) use cases are an entirely separate market.

Side note: Most of those high end audio interfaces are now external devices connected via USB (formerly also via Firewire).

And since a DAW typically doesn’t need high end video, I didn’t bother with a separate GPU on my latest DAW build. I will only add a GPU card, if I ever want to do higher end gaming, video production or machine learning on that computer.

p1necone · on April 23, 2023

Dedicated GPUs are a chip about the same size as a CPU + a big board with power delivery, memory modules, cooling etc. CPUs would be the same size if they had all that stuff on their own discrete board too.

Pairing a 4090 + a CPU on the same chip/side by side would be a stretch, but it has nothing to do with physical size - more the motherboard needing to handle power delivery, and the separate memory not being fast enough. Something like a 3060 is probably more reasonable.

This already exists too - the PS5 and Xbox Series X/S both use APUs with much more powerful GPUs than are available as standalone purchases.

ilyt · on April 24, 2023

> Dedicated GPUs are a chip about the same size as a CPU + a big board with power delivery, memory modules, cooling etc.

4090 have die area of ~600mm2

Ryzen 7 have 71mm2 CPU die and 122mm2 IO die

It's not even fucking comparable

> CPUs would be the same size if they had all that stuff on their own discrete board too

Oh, oh, I know, we can call the board that has all the power and IO CPU needs a "motherboard" cos it cradles the CPU in its hold (socket). Oh, wait, that's exactly how it is.

AnthonyMouse · on April 24, 2023

That's because you're comparing one of the biggest GPUs to a midrange CPU. Epyc 9654 has 12xCCD, which is bigger than the 4090. Radeon RX 6400 has a 107mm2 die, which is smaller than the Ryzen IO die.

p1necone · on April 25, 2023

Ryzen 7 is the high end of the gaming market in AMDs offerings. Buying an EPYC would give you worse performance for more cost and higher power budget (games still generally benefit more from a modest core count + high single core speed rather than the high core count + modest single core speed found in most high end server/workstation chips).

AnthonyMouse · on April 26, 2023

Ryzen 7 is the midrange processor. Ryzen 9 has a higher boost clock but also has two CCD for not much more money. The high end is Threadripper (though there isn't a Zen4 one (yet?)), and that has up to 8 CCD which is still bigger than the 4090.

You're comparing a ~$350 CPU to a ~$1600 GPU.

p1necone · on April 26, 2023

No, Ryzen 7 is the high end of their regular desktop chips. ThreadRipper is a different market segment altogether, and with all the ones they've released so far have been worse for gaming outside of a few titles that really benefit from the high thread count.

Intel and Ryzens consumer chips both follow the same marketing formula:

> Celeron/Athlon: ultra low end

> Ryzen 3/i3: low end

> Ryzen 5/i5: mid range

> Ryzen 7/i7: high end

> Ryzen 9/i9: I have money coming out of my ears and want a bigger number than everyone else.

CPUs in general are cheaper than GPUs - spending the same amount on the CPU as the GPU has almost never been the right move.

The fact that you have to move out of the consumer product range and into the workstation stuff to even find CPUs that cost as much as high end consumer GPUs should make that clear - the GPU equivalent in that space would be stuff like Quadros or RTX AXXX, which just like threadrippers and epycs are worse for gaming because they focus on memory and stability rather than raw power.

AnthonyMouse · on April 27, 2023

The market segment for Threadripper is literally called High-End Desktop (HEDT).

The pricing for Ryzen 7 and Ryzen 9 overlap. The Ryzen 9 7900 costs less than the Ryzen 7 7800X3D. And Ryzen 9 has better performance per dollar than Ryzen 7 on threaded workloads. The demarcation for "I have money coming out of my ears and want a bigger number than everyone else" is where dollars start going up faster than performance, which is unambiguously Threadripper/HEDT.

But Threadripper actually is significantly faster for the kind AI workload that would otherwise be done on a GPU, or for which a fast iGPU would make sense, because it has traditinally had twice as many memory channels. If Zen 4 Threadripper ends up on Socket SP6, it will have three times as many memory channels as Socket AM5.

Zen 4-based Ryzen 3 doesn't exist and there hasn't been a new Zen-based Athlon for desktop since Zen+ three years ago.

ilyt · on April 24, 2023

That's server CPU. 7950X3D isn't that much bigger,only one extra CCD

> Epyc 9654 has 12xCCD

That's $7k CPU you're comparing to $1600 GPU

AnthonyMouse · on April 26, 2023

I named a big one. It's expensive.

The 9354P is $2,730 and still has 8 CCD. That's >60% more than the 4090, but it's also >60% bigger, because Epyc has a ~400mm2 I/O die on top of the CCDs.

p1necone · on April 25, 2023

4090 was just a specific example, you don't have to drop much before die size is much more reasonable - e.g. 4070 is 295mm2

I'm not really sure what you're arguing so aggressively about - are you saying it's impossible to pair a CPU + a high end GPU on a single board? Because as I mentioned this already exists in the PS5 and XBSX, they could sell this in the PC space if they wanted to.

sliken · on April 24, 2023

Grace + Hopper is very similar to a 4090 + CPU. They are each on their own die/chiplet, but share a package. The CPU will of course be ARM not x86-64, but otherwise very similar to what you describe.