It felt like we were on 3 for a long time, and then all of a sudden got 4 throug...

adgjlsfhk1 · 2024-06-18T05:27:57

the big use cases are inter-computer communication and nvme ssds. pcie 4x16 gets you 400 gbps Ethernet. 6x16 will be 1.6 tbps. for SSDs, it's the difference between needing 4 and 1 lanes to saturate your bandwidth. a modern 2u server at this point can have an ungodly amount of incredibly fast storage, and can expose all that data to everyone else without a bandwidth bottleneck.

justinclift · 2024-06-18T04:07:23

Definitely data centre usage of some sort.

latchkey · 2024-06-18T02:33:07

AI/GPU communication is definitely driving it forward now. It is a speed race for how quickly you can move data around.

starspangled · 2024-06-18T05:18:12

Really? I hadn't heard of GPU or GPGPU pushing bandwidth recently. Networking certainly does. 400GbE cards exceed PCIe 4.0 x16 bandwidth, 800 is here, and 1.6 apparently in the works. Disk too though, just because a single disk (or even network phy) may not max out a PCI slot does not mean you want to dedicate more lanes than necessary to them because you likely want a bunch of them.

latchkey · 2024-06-18T05:35:33

We are at PCIe5 in the Dell XE9680. We add in 8x400G cards and they talk directly to the Network/ 8xGPUs (via rocev2).

800G ethernet is here at the switches (Dell Z9864F-ON is beautiful... 128 ports of 400G), but not yet at the server/NIC level, that comes with PCIe6. We are also limited to 16 chassis/128 GPUs in a single cluster right now.

NVMe is getting faster all the time, but is pretty standard now. We put 122TB into each server, so that enables local caching of data, if needed.

All of this is designed for the highest speed available today that we can get on the various bus where data is transferred.

oblio · 2024-06-18T09:12:12

I wonder if any of this trickles down into cloud providers reducing costs again. After all if we have zounds of fast storage, surely slower storage becomes cheaper?

latchkey · 2024-06-18T10:05:46

We do not directly compete with them as we are more of a niche based solution for businesses that want their own private cloud and do not want to undertake the many millions in capopex to build and deploy their own super computer clusters. As such, our offerings should not have an impact on their pricing. But who knows… maybe long term we will. Hard to say.

p1esk · 2024-06-18T05:51:09

Nvlink 4.0 used to connect H100 GPUs today is almost as fast as PCIe-7.0 (12.5GBs vs 16GBs). By the time PCIe-7.0 is available I’m sure NVlink will be much faster. So, yeah, GPUs are currently the most bandwidth hungry devices on the market.

latchkey · 2024-06-18T10:11:05

Will the lead time still be 50+ weeks though? My guess is yes.