Hacker News new | past | comments | ask | show | jobs | submit login

Interesting things I’ve learned about this chip after a little sluthing.

CEO implies in the FYI podcast that it can handle model sizes of a max of ~4 billion parameters per wafer. Respectable but not as large as I assumed would be possible when I first read of the scale of the chip.

CEO claims model parallelism will actually work well with these devices. Would be intriguing to know the limits of this.

Based on the cooling requirements, at least a GHz clock speed appears likely. If we take the heretical position that 1 parameter is >= 1 synapse, 4 billion parameters is about the size of a bee brain. 1GHz is about 5 million times faster than a bee brain.

It would take about 20000 such chips to simulate a human-sized network. Not economical and presumably the model parallelism would break long before this. But it is interesting to note that under the not implausible 1 parameter is >= 1 synapse assumption, we are only a few orders of magnitude away from human-sized networks training 5 million times real-time.




OK, a stupid question. Since this ANN if 5Bx as fast as animal neurons, couldn't it be "multiplexed" to simulate a much larger network in biological time (with massive state-storage memory, I presume)? I realize there will be propagation dependencies, but each dependency layer could be precomped before the next, I would guess?

Or is there a reason that this won't work (or at least won't be worth it) for ANN structures?


> Or is there a reason that this won't work (or at least won't be worth it) for ANN structures?

If all the weights and activations fit in on-chip memory, then you can do calculations at close to 100% efficiency. If you want to simulate a 20k times larger network, you also need to transfer the 4 Billion parameters per iteration, which would take a singificantly longer time. In other words, you would be seriously bottlenecked.


A synapse is responsible both for storing its weight, and for operating on its weight. Therefore, model size and throughput are interrelated numbers. If a synapse in a bee brain fires at 10 Hz on average (my guess), total events per second is 10 times the brain ‘capacity’. Max firing rate will be a lot larger, since most neurons are silent most of the time.

In silicon we separate out the storage from the compute. Generally we have much more storage than we have compute, and we route the data to the compute units as we need computation. Therefore ALUs have aggregate throughput of only a small fraction of the model capacity per cycle; at 400,000 cores with, say, 8 16-bit operations per cycle each, that is ‘only’ ~10^6 operations per cycle.

A better way to get a measure of computational throughput is to compare those values. Numbers are hard to find, but 10^10 synaptic events/s for a bee and 10^15 edge calculations/s for Cerebras seems the right sort of ballpark. That's a difference of 10^5, a little less than your estimate of 10^7.


If we could do full wafer analog neural network - that's probably another 2 order of magnitude.

That means we'll need 200 such chips.

Why is the wafer so expensive ? Buying a manufactured wafer from TSMC ,16nm is I think is $4k-5k

And there are still a few nodes left. While cost per gate don't go down much, it could solve some of the external bandwidth issues.

But all those things are really expensive and very risky. Who knows...


$4-5k per wafer (sounds very low to me) doesn't say anything about yields due to manufacturing error. The smaller a CPU is, the higher your yields on a wafer because each error will destroy fewer and fewer chips.

When you have only a few giant chips per wafer, each error becomes devastating, taking out a large fraction of the wafer. Errors are so common I wouldnt be surprised if it took them 5-10 wafers per working chip, especially if they dont do any binning or they didnt design the chip to fuse off bad sections like Intel CPUs do (a single i3/i5/i7/i9 line is usually the same exact chip, with cores that have too many manufacturing errors fused off so perfect parts become i9s, less perfect parts become i7s, etc).


No that's not how you do giant chips. Even at the scale of CPUs, there are enough bad parts that it's uneconomical to throw away chips. What Intel/AMD do, for example, they make 4-core chips. If one core doesn't work, they sell it as 3-core or 2-core system. If some of the cache doesn't work, they sell it as a lower cache version.

In the case of Cerebras, they have disable the bad blocks to get chips that actually work.


Sure, but you still end up with area that is critical and shared and doesn't yield to this strategy, and you also end up with defects that are bad enough that you still can't employ a wafer with portions turned off.

So, someone like Cerebras has to both make as little critical as possible and buy much more expensive wafers with lower defect rates and still get only moderate yields.


>> buy much more expensive wafers with lower defect rates

How do you do that ? where do i read more about this ?


When you buy wafers, particle counts at sizes are specified, along with all kinds of other properties-- some determined through non-destructive testing (and can be used for binning) and some through destructive testing (and can be used for characterizing lots).

Better wafers cost more.


But remember the cpu market is extremely price sensitive and Intel/AMD have such huge volumes. Manufacturers of large chips don't necessarily need to worry about the extra complication of binning in order to yield.


This wafer scale tech manages yield issues internally, bypassing bad circuits.

That probably means their wafer yield is 100%.

As for wafer prices: https://anysilicon.com/major-pure-play-foundries-revenue-per...

It's $6k for all wafer smaller than 20nm, which means 16nm is somewhat cheaper than that.


Also, It does not mean 16nm is cheaper than 20nm... the smaller the feature size, the more the wafer costs. Also, this is aggregate revenue at the foundry, and is ignoring things like tapeout costs.

Not trying to crap on the tech, but it's not so simple as we can merely go to the fab and get small quantities of full custom, 100% yielding chips, for 6k ea.


>That probably means their wafer yield is 100%.

That doesn't make any sense because errors are an inherent part of the manufacturing process. If errors in the manufacturing process make 1% of the area unavailable that means perfect yields could at most be 99%.


I would be VERY shocked to see 100% yield... theres only so much you can do on chip though this strongly depends on what your definition of a working chip is :-P


Are you implying that -- barring all the unknowns about how brain works --, this chip already has enough capability to simulate a human-size network at natural speed?


I don't think so, because you need 20000 of them, and one is going to cost on the order of a million quid (I'd guess half a million). Even if you could afford all that, the network connections between them will be the limiting factor.


So... Half the speed? Still good!


Hmm... more like 20000 ants connected in some kind of network, I'd say.


That's still something...

https://youtu.be/Bcs3_b3VXSU




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: