They don't. You get twice the transistors, twice the power, and more than twice the cost (because the bonding itself costs money). See GPUs where power is increasing from 700 W to 1500 W. Moore's Law without Dennard scaling is kind of meh. You do save on networking because you need fewer servers.
I think the current version of their 3D cache does now extend over the cores and not just the cache of the underlying die.
The other big factor is that their cache chiplets are built with a fab process and standard cells that cannot tolerate high voltages, so the cores (which are on the same power rail) are constrained to not operate at the extreme end of the voltage/frequency curve where high-end desktop processors sacrifice everything to win a benchmark.
More component count often let's you run same perf at lower clock (Nvidia 2000 series mobile chips had more cuda cores and lower clock than desktop and could match it at lower TDP).
You are right that geometry suggests a volume(heat generation) to surface area(heat dissipation) issue arising. You clearly don't want to build a sphere of pure compute layers.
But with the ability to stack compute layers AND non-compute layers of varying thickness you now can have most any 3D shape you like. There will be adverse effects that distance will have on spacing things out of course.
Maybe stack a bunch of compute rings to form a hollow tube with liquid cooling though the middle and outside?
Or reverse that and have multiple compute sticks hanging off a baseboard dipping into your cooling vat like some reactor homage.
I think liquid cooling will become more commonplace.
Early days yet and brighter minds than mine will make things work but I am optimistic for the future!
anyway, just daydreaming with this, but I wonder if that would be feasible if every layer had fluid channels baked in? maybe oriented so convection does most of the work and submerged in subzero fluid?
would have to use a transparent chamber and really complicated looking connections ofc. need to maximize that cool factor
The computer layers are incredibly thin. Even having deadweight cooling layers transporting fluid between the computer layers would want minimal thickness.
This would require some fancy liquid management. Flow would be important and the slightest hint of a blockage most detrimental.
In similar vein to 3D printers being used to print parts to upgrade themselves:
"This generation of computing is fantastic for fluid modelling the cooling required for the next generation of computing."
These connections are big compared to other features - that means more capacitance (but far less than pins and PCB traces) that means that you are not going to be building designs where there's a design block moves into 3D more it's for connecting logical blocks together (CPU's L1 to L2/L3 to memory) the sorts of places where you can spend a clock to move data between layers.
As other's have mentioned dealing with heat is an issue too - all those insulating layers don't conduct heat well either - 3D chips tend to the "hairy smoking golfball" scenario where getting rid of heat becomes your biggest problem
This whole tech is mostly to solve problem of die defects. When die size increases, output decreases very much. One (really frequently used) solution, to make design from few separate chips on one physical die and then disable part of die with jumpers (and reconfigure working parts with jumpers).
For example, many DDR-3 chips have 6 chips on one die, but only 2..4 enabled after finish testing.
But if one could cheaply construct 2.5D, placing tested chiplets on silicon imposer, this create whole new fabrication opportunities.
For example, in newest Intel chips, could be one high-performance core (i7) + few Atom cores + HBM high-speed DRAM and even some high power current switch or analog die.
Looks like, in nearest decade we will see single chip smartphones, etc.
To solve the heat issue let's limit ourselves to 2.5D chips, since the bottom layer barely heats.
Die size:350mm2,millions of connections per mm2, meaning 350 million connections @1ghz - 350,000 tbit/sec of memory bandwidth. The realistic number Will probably be much lower, but still, even 2.5D could solve the memory wall.
Since the interposer traffic is mostly vertical, you could add horizontal tracks of metal to conduct some heat out. Limited, but better than nothing.
I wonder how small a Stirling engine can be and power microfluidic cooling inside one such interposer.
What I really like about this is that the buses can be very wide, so even if you spend one clock cycle to do something, that something can be quite a lot of small things.
Could it be that by enabling dense, efficient connections between stacked chips, we are not only sustaining Moore’s Law but also paving the way for new chip architectures? What could integrating exotic materials and functions within a single package mean for the future of HPC?
really? the font size seems normal to me. normal for a professional publication at least. I'm on mobile atm and can't inspect, but it seems like 14pt maybe?
When I copy and paste it into a Google doc, it says it's 18pt.
Looks big to me in Safari on an ipad and in Chrome on Windows. Here's three websites (Engadget, NYTimes, and IEEE) at 100% scaling in Chrome for me (all three images are 1200x800):