> Also, if we were to directly copy texture data from a fast SSD into video memory, instead of SSD --> RAM --> VRAM, the loading times would see a huge improvement as it bypasses the extra copy.
The extra trip through CPU memory is not a significant bottleneck. DRAM bandwidth is still much greater than a single SSD's bandwidth. And there's no PCIe traffic saved, because even with P2P DMA from the SSD to the CPU, data needs to get into the CPU package where the PCIe root complex is to be routed to the GPU. There's a marginal latency improvement, and you can save a bit of the CPU's RAM capacity by not needing to use as much of it for IO buffering.
You've probably misunderstood the mechanisms behind the claims that new NVMe-based video game consoles enable much faster loading times, and the somewhat-related Microsoft DirectStorage technology (which is still rather ill-defined). P2P DMA is a minor optimization. Offloading decompression work from the CPU to the GPU is a big help, if the GPU has the right hardware to accelerate that decompression.
But by far the biggest improvement comes simply from increasing the baseline hardware capabilities that developers are targeting. When game developers can count on running off an SSD, it becomes worthwhile to overhaul the software's approach to IO so that the game issues requests with the high queue depths necessary for extracting full performance from flash-based SSDs, and the game doesn't have to pre-load multiple minutes worth of content into RAM because it can count on being able to perform multiple asset fetches per frame.
The extra trip through CPU memory is not a significant bottleneck. DRAM bandwidth is still much greater than a single SSD's bandwidth. And there's no PCIe traffic saved, because even with P2P DMA from the SSD to the CPU, data needs to get into the CPU package where the PCIe root complex is to be routed to the GPU. There's a marginal latency improvement, and you can save a bit of the CPU's RAM capacity by not needing to use as much of it for IO buffering.
You've probably misunderstood the mechanisms behind the claims that new NVMe-based video game consoles enable much faster loading times, and the somewhat-related Microsoft DirectStorage technology (which is still rather ill-defined). P2P DMA is a minor optimization. Offloading decompression work from the CPU to the GPU is a big help, if the GPU has the right hardware to accelerate that decompression.
But by far the biggest improvement comes simply from increasing the baseline hardware capabilities that developers are targeting. When game developers can count on running off an SSD, it becomes worthwhile to overhaul the software's approach to IO so that the game issues requests with the high queue depths necessary for extracting full performance from flash-based SSDs, and the game doesn't have to pre-load multiple minutes worth of content into RAM because it can count on being able to perform multiple asset fetches per frame.