Hacker News new | past | comments | ask | show | jobs | submit login

Cerebras WSE-3 contains 4 trillion transistors and 8 exaflops per sec, 20 PB bandwidth. 62 times the cores of an H100.. 900,000. I wonder if the WSE-3 can compete on price / performance though. Interesting times!



Is anyone actually using those WSEs in anger yet? They're on their third generation now, but as far as I can tell the discussion of each generation consists of "Cerebras announces new giant chip" and then radio silence until they announce the next giant chip.


Problem is Software. You can put out a XYZ trillion monster chip that beats anything hardware wise, but it is going nowhere if you don't have the tooling and massive community (like Nvidia has) to actually do some real A.I. stuff.


Unlikely. They cost so much that nobody is going to do research on them - at best it's porting existing models. And they're so different to GPUs that the porting effort is going to be enormous.

They also suffer from the global optimisation problem for layout of calculations so compile time is going to be insane.

Their WSE technology is also already obsolete - Tesla's chip does it in a much more logical and cost effective way.


They sold some. Not strictly speaking the same as using any but there's a decent chance some code is running on the machines.


The Cerebras-2 is at the Pittsburgh Supercomputing Center. Not sure if they ordered a 3.


> 62 times the cores of an H100.. 900,000.

More than that arguably. CUDA cores are more like SIMD lanes than CPU cores like cerebras's usage of 'core'. Since they have 4 wide tensor ops on cerebras, there's arguably 3.6M CUDA equivalent cores.


> 8 exaflops per sec

Your number is off by 64x.

It can do 125 petaflops at FP16

https://www.tomshardware.com/tech-industry/artificial-intell...


9 trillion flops per core? That's... mind-boggling. Is that real?

And, 9 trillion flops per core in 4.4 million transistors per core. That sounds a bit too good to be true.


No the real performance is 125 petaflops at FP16. That is 138 gigaflops per core.

You would need 64 of these to get 8 exaflops.

https://www.tomshardware.com/tech-industry/artificial-intell...


How's the single core-to-core bandwidth?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: