"Announced at today's ISC23 keynote, the Aurora genAI model is going to be trained on General text, Scientific texts, Scientific data, and codes related to the domain."
Not only is the model not yet trained, but the datacenter that will train it has not yet been built "the Aurora supercomputer is said to launch later this year"
Just an FYI, that's when it'll be completed and access opened up to a wider audience. Many parts are already built and being tested. We're talking about a DOE computer, not a AWS datacenter.
“The 2 Exaflops Aurora supercomputer is a beast of a machine and the system will be used to power the Aurora genAI AI model.”
and instead of a data center with a distributed load of Nvidia cards, they’re going to do 1 supercomputer? Are these guys intentionally stuck in 2007 or am I missing something
> and instead of a data center with a distributed load of Nvidia cards
You're missing something. Most models are trained on supercomputers. You still use DDP, but you also use Slurm. A supercomputer is just a many node machine, but one where you REALLY care about interconnect. It's why the fabric is one of the big points here. I/O is generally your bottleneck in these systems. Each node should be as powerful of a system as possible because of this.
A datacenter doesn't need a PB/s connection between machines. Your I/O probably isn't your bottleneck and so the tradeoff between PCIe and ethernet (infiniband or whatever) isn't a big deal. Usually because your processes are __independent__ (containers). In a supercomputer, your processes may be parallel, but that doesn't mean they're independent. You'll spend a lot of time writing code to be non-blocking but you're going to have to map reduce at some point.
Racks full of GPU nodes are called supercomputers. Supercomputers that ran a single instance of an OS kernel are from even farther back, maybe the last century.
It's a datacenter, not a single machine obviously. All "supercomputers" are datacenter today. They can use Nvidia or other processing units, it doesn't really matter.
- 10,000 nodes with 2xCPU + 6xGPU with unified memory
- 10PB of agreggate RAM
- 230PB of storage
Scale matters in supercomputing. Keeping latency low between everything allows you to solve different classes of problems that gets no speed up benefit by simply offloading compute to nvidia cards. It makes a huge difference how everything is connected.
One H100 has the same memory bandwidth as the two CPUs in each of the compute nodes. But the supercomputer has >9000 compute nodes, so 62x H100 GPUs might fall short there.
They also might not be measuring the same thing. H100 GPUs have "3026 TFLOPS":
I can only assume someone is working on a BS detector. And holy fuck, I would pay cash money for a way to identify it without having delve into it myself. Imagine not having to dive into paper only to find out that the population sample is so low that it shouldn't even be considered for submission or better yet have online news qualified as 'propaganda beneficial to side x'.
One of the few interesting ideas from Neal Stephenson’s book Fall is the idea that the rich pay people to curate the information they see on the web and in their social feeds. The poor have to wade through all of the machine generated content and propaganda and figure out what is true on their own.
An AI that does that detection would be both wonderful and dangerous.
(That book abandoned its only interesting ideas and went totally off the rails a few chapters later IIRC.)
In a sense, it already is. My voice is drowned in a sea of SEO-optimized gibberish. Even if I had something interesting and novel to say, an average person would be hard pressed to 1) find me 2) convince me to exchange ideas 3) pay money for it. There is usually a market for experts with niche appeal, but, well, the market only needs so many.
Still, it is a great question and part of me wonders about the whatifs of this evolution.
So this thing doesn't actually exist yet?