Hacker News new | past | comments | ask | show | jobs | submit login
Nvidia reveals Blackwell B200 GPU, the 'most powerful chip' for AI (theverge.com)
24 points by mfiguiere 8 months ago | hide | past | favorite | 18 comments



The article makes no mention at all of how much VRAM they might have. Other articles state plans for up to 192GB of HMB3e, and a power draw of 1000 watts.


I'm pretty sure the WSE-3 chip from Cerebrus is way more powerful. It has 900,000 cores, and something like 27 petaflops of bandwidth between those cores, and shares all the memory with them, in one chip. They are developing an 8 exaflop cluster right now.

https://8968533.fs1.hubspotusercontent-na1.net/hubfs/8968533...


you mean the chip that only has 44GB of memory on one chip and doesn't mention how fast chips can talk between each other?


Why cut up a a wafer of chips, package each with HBM, put the package on a board, connect to CPUs with a fabric, then tie them all back together with networking chips and cables? Their 3 new clusters are the top 3 biggest AI training platforms in the world. Comparing the WSE-3 against Nvidia's H100. "It's got 52 times more cores. It's got 800 times more memory on chip. It's got 7,000 times more memory bandwidth and more than 3,700 times more fabric bandwidth. But they don't sell the chips, they build the clusters and sell the compute power. Except for a couple they built in Dubai.

Plus, newer models are moving from Transformer to Mamba and don't need as much memory, because they save the important information, not everything it's been trained on.


I know it’s only FP4, but 1.4 exaflops in one rack is still crazy.


With scaling and some tricks, FP4 inference can be very close to 16 bit.

And most software out there is still missing on FP8 lol


Doesn't that suggest the training method is flawed?


For training, you need to follow a gradient, and usually the gradient is small enough that you need the precision. Training in 8-bit is fairly new but Google and now Grok seem to have gotten good results.


So these cards are geared towards inference ? There's a register article showing amds new instinct accelerators have much better.FP64


The article has numbers for both training, I think in FP8, and inference in FP4.


And Microsoft.


> next-gen NVLink switch that lets 576 GPUs talk to each other, with 1.8 terabytes per second of bidirectional bandwidth.

Well that's a pretty big step up from the 18 for the H100...



It's terrifying to think about how many watts of power are being spent on generating useless trash


Most human consumption is useless trash by some metric.


This is a bit vague, are you talking about this new GPU/LLMs, social media, industrial society, or something else?


Fun fact: each human uses an average 100W of power and generates mostly useless crap.


> each human uses an average 100W of power

Does that include heating, electricity and fuel spent for the benefit of the human? How about infrastructure? Externalities are huge.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: