LUMI to become the world's fastest supercomputer

yetihehe · on Oct 21, 2020

Title is not really right, when finished it will be in top5.

> When LUMI’s operations start next year, it will be one of the world’s fastest supercomputers.

> The peak performance of LUMI is an astonishing 552 petaflop/s meaning 552 *10^15 floating point operations per second. This figure makes LUMI one of the world’s fastest supercomputers. For comparison, the world’s fastest computer today (Fugaku in Japan) reaches 513 petaflop/s and the second fastest (Summit in the US) 200 petaflop/s

But good to see heat used for district heating, also that's probably the first supercomputer which will not be used for calculating aging atomic weapon yields.

edit: first time I heard about not calculating aging atomic weapon yields. Thanks.

atty · on Oct 21, 2020

To elaborate a bit, there are planned to be at least 3 exascale computers running by the end of 2021, that I am aware of (1 in China and 2 in the US), with Frontier at ORNL estimated to provide about 3x the compute power of LUMI. It’s good to see the supercomputer arms race going strong. There are so many problems in the physical sciences that just can’t be solved without larger and larger supercomputer-style clusters. I know this is a kind of ill-formed question, but does anyone in the neuro or bio fields know how far away we are from being able to convincingly simulate a brain (human or other mammal), in terms of raw compute power?

stonogo · on Oct 21, 2020

Summit is used entirely for unclassified computations.

gnufx · on Oct 21, 2020

Like most supercomputers (outside the Tri-Labs?), of course.

nynx · on Oct 21, 2020

Folding@home hit 2.43 exaflops earlier this year [0]. I'm surprised massively distributed computing isn't being looked into with more fever. It looks like it had somewhere around a 670,000 GPUs running in parallel with ~1.4 million CPUs.

Users would need to be incentivised to install distributed computing software, but I think it has promise.

[0]: https://archive.vn/20200412111010/https://stats.foldingathom...

dragontamer · on Oct 21, 2020

Distributed computing has high flops, but very low connection speeds.

LUMI has 200Gbit connections between nodes, or roughly 25GBytes/sec: faster than PCIe 3.0 x16 (15.8 GBps).

In effect: supercomputers can share "remote memory" as if it were local (RDMA protocol). As such, you can treat the entire RAM-space as if it were unified (your 64-bit pointers can be unified across the entire supercomputer, your data-structures can be distributed and always accessed through a 200Gbit-pipeline).

--------

As it turns out: you need a very high I/O connection to truly sustain supercomputing workloads. A lot of these things turn out to be just crazy big matrix-multiplication problems that require a fair amount of coordination between all nodes.

You can't share a problem like that on distributed compute. At best, you can only share problems that can fit on one machine (under 32GBs of RAM). In contrast, these supercomputers can work on 100+ TBs of shared-RAM problems with 100,000+ TBs of shared storage (such as simulating quantum effects). The shared storage is accessed at 2TB/s speeds, and is accelerated with Flash SSD cache layers.

---------

As some people say: the job of a supercomputer is to turn everything into an I/O constrained problem. As such, a HUGE amount of money is poured into making I/O as fast as reasonably possible. You don't want your PFlop-scale machine to be throttled by slow storage or communications.

gnufx · on Oct 21, 2020

Broken record time, but it's the latency that's most important, not the bandwidth which you might have with generic Ethernet. Also trades-off in the fabric topology; I don't remember if Cray are still using Dragonfly.

fh973 · on Oct 21, 2020

Depends on the problem. Codes that run on supercomputers need low latency communication and access to a lot of data. Think physical simulations.

fulafel · on Oct 21, 2020

Anyone know or hve ballpark guesses, what % of physics computing is done on big hpc systems as opposed to medium/small hpc (or non hpc) clusters and single machines ? Eg counting by # of simulated experiments that get published or written up.