I was under the impression that PCI-E was perfectly capable of sending notificat...

touisteur · on Sept 14, 2021

Recent datacenter network controlers (Mellanox, marvell) have 'gpu direct' capabilities, so direct interactions with devices with no cpu interaction. I've also seen fpga+network boards do that with success. And with libraries like nccl and 200gbe eth links you could almost forget you have CPUs or network links between.

What I miss is a simple but efficient data queue between cpu and gpu. Everyone's doing manual memory reservation and cudamemcpy, I want an async send (gpu->cpu) with an mpi or socket-like interface. I've seen someone posting stuff on io_uring from gpu code, but just bragging, no code.

Buying Mellanox, and their bluefield dpu (integrated 8 or 16 arm cores in the NIC) stuff I feel, nvidia could probably go the way you're seeing. Haven't seen any Mellanox/NVIDIA tech convergence yet.