"Announced at today's ISC23 keynote, the Aurora genAI model is going to be train...

ftxbro · on May 22, 2023

Not only is the model not yet trained, but the datacenter that will train it has not yet been built "the Aurora supercomputer is said to launch later this year"

godelski · on May 22, 2023

Just an FYI, that's when it'll be completed and access opened up to a wider audience. Many parts are already built and being tested. We're talking about a DOE computer, not a AWS datacenter.

barthelomew · on May 22, 2023

Yes, it does seem like it's concept stage.

tpmx · on May 22, 2023

I can do a decent Daniel Gackle (dang) impression:

This looks like an announcement of an announcement, which isn't on topic for HN. https://hn.algolia.com/?dateRange=all&page=0&prefix=true&sor...

On HN, there's no harm in waiting for the actual thing: https://hn.algolia.com/?dateRange=all&page=0&prefix=false&so...

yieldcrv · on May 22, 2023

“The 2 Exaflops Aurora supercomputer is a beast of a machine and the system will be used to power the Aurora genAI AI model.”

and instead of a data center with a distributed load of Nvidia cards, they’re going to do 1 supercomputer? Are these guys intentionally stuck in 2007 or am I missing something

godelski · on May 22, 2023

> and instead of a data center with a distributed load of Nvidia cards

You're missing something. Most models are trained on supercomputers. You still use DDP, but you also use Slurm. A supercomputer is just a many node machine, but one where you REALLY care about interconnect. It's why the fabric is one of the big points here. I/O is generally your bottleneck in these systems. Each node should be as powerful of a system as possible because of this.

A datacenter doesn't need a PB/s connection between machines. Your I/O probably isn't your bottleneck and so the tradeoff between PCIe and ethernet (infiniband or whatever) isn't a big deal. Usually because your processes are __independent__ (containers). In a supercomputer, your processes may be parallel, but that doesn't mean they're independent. You'll spend a lot of time writing code to be non-blocking but you're going to have to map reduce at some point.

whatshisface · on May 22, 2023

Racks full of GPU nodes are called supercomputers. Supercomputers that ran a single instance of an OS kernel are from even farther back, maybe the last century.

elcomet · on May 22, 2023

It's a datacenter, not a single machine obviously. All "supercomputers" are datacenter today. They can use Nvidia or other processing units, it doesn't really matter.

HybridCurve · on May 22, 2023

Aurora Supercomputer:

- 200Gb dragonfly interconnect fabric

- 10,000 nodes with 2xCPU + 6xGPU with unified memory

- 10PB of agreggate RAM

- 230PB of storage

Scale matters in supercomputing. Keeping latency low between everything allows you to solve different classes of problems that gets no speed up benefit by simply offloading compute to nvidia cards. It makes a huge difference how everything is connected.

mirekrusin · on May 22, 2023

...which is equivalent to just 62x H100 GPUs from Nvidia?

AnthonyMouse · on May 22, 2023

One H100 has the same memory bandwidth as the two CPUs in each of the compute nodes. But the supercomputer has >9000 compute nodes, so 62x H100 GPUs might fall short there.

They also might not be measuring the same thing. H100 GPUs have "3026 TFLOPS":

https://www.pny.com/nvidia-h100

...at FP8. 51 TFLOPS at FP32. Supercomputer lists typically use FP64.

phkahler · on May 22, 2023

Why train on scientific texts when half the papers are BS?

randomname93857 · on May 22, 2023

it would be good to train LLMs to distinguish BS articles from good ones. but it's rather unrealistic expectations

A4ET8a8uTh0 · on May 22, 2023

I can only assume someone is working on a BS detector. And holy fuck, I would pay cash money for a way to identify it without having delve into it myself. Imagine not having to dive into paper only to find out that the population sample is so low that it shouldn't even be considered for submission or better yet have online news qualified as 'propaganda beneficial to side x'.

wj · on May 22, 2023

One of the few interesting ideas from Neal Stephenson’s book Fall is the idea that the rich pay people to curate the information they see on the web and in their social feeds. The poor have to wade through all of the machine generated content and propaganda and figure out what is true on their own.

An AI that does that detection would be both wonderful and dangerous.

(That book abandoned its only interesting ideas and went totally off the rails a few chapters later IIRC.)

namaria · on May 22, 2023

At which point increasing dependence on outside validation makes your opinion become irrelevant?

A4ET8a8uTh0 · on May 22, 2023

In a sense, it already is. My voice is drowned in a sea of SEO-optimized gibberish. Even if I had something interesting and novel to say, an average person would be hard pressed to 1) find me 2) convince me to exchange ideas 3) pay money for it. There is usually a market for experts with niche appeal, but, well, the market only needs so many.

Still, it is a great question and part of me wonders about the whatifs of this evolution.