Interesting things I’ve learned about this chip after a little sluthing. CEO imp...

K0balt · on Nov 17, 2019

OK, a stupid question. Since this ANN if 5Bx as fast as animal neurons, couldn't it be "multiplexed" to simulate a much larger network in biological time (with massive state-storage memory, I presume)? I realize there will be propagation dependencies, but each dependency layer could be precomped before the next, I would guess?

Or is there a reason that this won't work (or at least won't be worth it) for ANN structures?

0-_-0 · on Nov 17, 2019

> Or is there a reason that this won't work (or at least won't be worth it) for ANN structures?

If all the weights and activations fit in on-chip memory, then you can do calculations at close to 100% efficiency. If you want to simulate a 20k times larger network, you also need to transfer the 4 Billion parameters per iteration, which would take a singificantly longer time. In other words, you would be seriously bottlenecked.

Veedrac · on Nov 17, 2019

A synapse is responsible both for storing its weight, and for operating on its weight. Therefore, model size and throughput are interrelated numbers. If a synapse in a bee brain fires at 10 Hz on average (my guess), total events per second is 10 times the brain ‘capacity’. Max firing rate will be a lot larger, since most neurons are silent most of the time.

In silicon we separate out the storage from the compute. Generally we have much more storage than we have compute, and we route the data to the compute units as we need computation. Therefore ALUs have aggregate throughput of only a small fraction of the model capacity per cycle; at 400,000 cores with, say, 8 16-bit operations per cycle each, that is ‘only’ ~10^6 operations per cycle.

A better way to get a measure of computational throughput is to compare those values. Numbers are hard to find, but 10^10 synaptic events/s for a bee and 10^15 edge calculations/s for Cerebras seems the right sort of ballpark. That's a difference of 10^5, a little less than your estimate of 10^7.

petra · on Nov 17, 2019

If we could do full wafer analog neural network - that's probably another 2 order of magnitude.

That means we'll need 200 such chips.

Why is the wafer so expensive ? Buying a manufactured wafer from TSMC ,16nm is I think is $4k-5k

And there are still a few nodes left. While cost per gate don't go down much, it could solve some of the external bandwidth issues.

But all those things are really expensive and very risky. Who knows...

akiselev · on Nov 17, 2019

$4-5k per wafer (sounds very low to me) doesn't say anything about yields due to manufacturing error. The smaller a CPU is, the higher your yields on a wafer because each error will destroy fewer and fewer chips.

When you have only a few giant chips per wafer, each error becomes devastating, taking out a large fraction of the wafer. Errors are so common I wouldnt be surprised if it took them 5-10 wafers per working chip, especially if they dont do any binning or they didnt design the chip to fuse off bad sections like Intel CPUs do (a single i3/i5/i7/i9 line is usually the same exact chip, with cores that have too many manufacturing errors fused off so perfect parts become i9s, less perfect parts become i7s, etc).

satya71 · on Nov 17, 2019

No that's not how you do giant chips. Even at the scale of CPUs, there are enough bad parts that it's uneconomical to throw away chips. What Intel/AMD do, for example, they make 4-core chips. If one core doesn't work, they sell it as 3-core or 2-core system. If some of the cache doesn't work, they sell it as a lower cache version.

In the case of Cerebras, they have disable the bad blocks to get chips that actually work.

mlyle · on Nov 17, 2019

Sure, but you still end up with area that is critical and shared and doesn't yield to this strategy, and you also end up with defects that are bad enough that you still can't employ a wafer with portions turned off.

So, someone like Cerebras has to both make as little critical as possible and buy much more expensive wafers with lower defect rates and still get only moderate yields.

petra · on Nov 17, 2019

>> buy much more expensive wafers with lower defect rates

How do you do that ? where do i read more about this ?

mlyle · on Nov 17, 2019

When you buy wafers, particle counts at sizes are specified, along with all kinds of other properties-- some determined through non-destructive testing (and can be used for binning) and some through destructive testing (and can be used for characterizing lots).

Better wafers cost more.

nsteel · on Nov 18, 2019

But remember the cpu market is extremely price sensitive and Intel/AMD have such huge volumes. Manufacturers of large chips don't necessarily need to worry about the extra complication of binning in order to yield.

petra · on Nov 17, 2019

This wafer scale tech manages yield issues internally, bypassing bad circuits.

That probably means their wafer yield is 100%.

As for wafer prices: https://anysilicon.com/major-pure-play-foundries-revenue-per...

It's $6k for all wafer smaller than 20nm, which means 16nm is somewhat cheaper than that.

TFortunato · on Nov 17, 2019

Also, It does not mean 16nm is cheaper than 20nm... the smaller the feature size, the more the wafer costs. Also, this is aggregate revenue at the foundry, and is ignoring things like tapeout costs.

Not trying to crap on the tech, but it's not so simple as we can merely go to the fab and get small quantities of full custom, 100% yielding chips, for 6k ea.

imtringued · on Nov 17, 2019

>That probably means their wafer yield is 100%.

That doesn't make any sense because errors are an inherent part of the manufacturing process. If errors in the manufacturing process make 1% of the area unavailable that means perfect yields could at most be 99%.

TFortunato · on Nov 17, 2019

I would be VERY shocked to see 100% yield... theres only so much you can do on chip though this strongly depends on what your definition of a working chip is :-P

dsign · on Nov 17, 2019

Are you implying that -- barring all the unknowns about how brain works --, this chip already has enough capability to simulate a human-size network at natural speed?

IshKebab · on Nov 17, 2019

I don't think so, because you need 20000 of them, and one is going to cost on the order of a million quid (I'd guess half a million). Even if you could afford all that, the network connections between them will be the limiting factor.

emteycz · on Nov 17, 2019

So... Half the speed? Still good!

jacobush · on Nov 17, 2019

Hmm... more like 20000 ants connected in some kind of network, I'd say.

rbanffy · on Nov 17, 2019

That's still something...

https://youtu.be/Bcs3_b3VXSU