Hacker News new | past | comments | ask | show | jobs | submit login

3D XPoint is amazing stuff. I'm really bummed out it hasn't taken the SSD market by storm. Even highly technical people fail to understand the critical importance of fast random access for a multitude of problems.



I think at least for now Optane is being marketed to IOPS heavy servers and workstations at a premium. I too hope it makes its way out into general storage applications, but also in more creative ways - like potentially replacing DRAM - if it gets fast enough.

[edit] I will say I'd think Micron would be in a better position to get it there than TI, though, given their position in the DRAM and NAND markets.


Note that Micron is giving up on 3D XPoint entirely, and TI is only buying the fab and most or all of the current tooling inside it, but not the IP. So TI can't pick up this torch. If 3D XPoint has any significant future, it's solely with Intel and their Optane branded products.


It's interesting that TI would want all of that tooling: part of 3DXPoint's problem is that it has some very nonstandard fab steps to build the central structure. (This makes it expensive, and attempts at cost reduction seem from the outside to have failed.) So why would TI want that tooling (if indeed they are getting and keeping it all)?

It makes one wonder about TI's FRAM product line, which if I'm remembering correctly involves similar materials as 3DXPoint. But since they do not get the IP, they're not going to manufacture 3DXPoint. And FRAM is sufficiently different that it is very unlikely to replace NAND. In particular, FRAM reads are destructive and wear down the cell! That may work out with a large, lower-density cell being accessed relatively infrequently by a lower-end embedded part, but will certainly fail for small, high-density cells accessed heavily. FRAM is weird stuff!


I don't know much about the exact tools needed for 3DXPoint, but the value of semiconductor manufacturing capacity generally in the U.S. has dramatically increased in the past year. This is now seen as a strategic resource and the government is willing to subsidize it. The value of 3DXPoint production though is not really strategic to the U.S. since hardly anything uses it that couldn't also use regular flash memory.

If a fab can show that they have the ability to at least partially fill the role of a TSMC in case of a disruption, there are large profits to be had. This may be the context and motivation for the sale.


With NVMe drives, load times for personal use applications are largely CPU-bound. For example, benchmarking game load times comparing a SATA SSD to an NVMe drive over four times as fast resulted in a less than ten percent shorter load time. Going to a tier above that just doesn't make sense for 98 percent of home users.


A lot of this is due to legacy software decisions.

To deal with old slow drives, people use expensive compression algorithms which take the CPU longer to decompress than reading the uncompressed data itself.

Also, if we were to directly copy texture data from a fast SSD into video memory, instead of SSD --> RAM --> VRAM, the loading times would see a huge improvement as it bypasses the extra copy.


> Also, if we were to directly copy texture data from a fast SSD into video memory, instead of SSD --> RAM --> VRAM, the loading times would see a huge improvement as it bypasses the extra copy.

The extra trip through CPU memory is not a significant bottleneck. DRAM bandwidth is still much greater than a single SSD's bandwidth. And there's no PCIe traffic saved, because even with P2P DMA from the SSD to the CPU, data needs to get into the CPU package where the PCIe root complex is to be routed to the GPU. There's a marginal latency improvement, and you can save a bit of the CPU's RAM capacity by not needing to use as much of it for IO buffering.

You've probably misunderstood the mechanisms behind the claims that new NVMe-based video game consoles enable much faster loading times, and the somewhat-related Microsoft DirectStorage technology (which is still rather ill-defined). P2P DMA is a minor optimization. Offloading decompression work from the CPU to the GPU is a big help, if the GPU has the right hardware to accelerate that decompression.

But by far the biggest improvement comes simply from increasing the baseline hardware capabilities that developers are targeting. When game developers can count on running off an SSD, it becomes worthwhile to overhaul the software's approach to IO so that the game issues requests with the high queue depths necessary for extracting full performance from flash-based SSDs, and the game doesn't have to pre-load multiple minutes worth of content into RAM because it can count on being able to perform multiple asset fetches per frame.


NVIDIA calls this GPUDirect Storage. You might find their blog post to be useful: https://developer.nvidia.com/blog/gpudirect-storage/





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: