Hacker News new | past | comments | ask | show | jobs | submit login

This would be incredibly useful for distributed machine learning - imagine a Tensorflow implementation that almost entirely bypasses CPU.



For most applications, getting training images onto the GPU isn't the bottleneck by far. Training the Inception model, for example, handles batches of 32 images (299x299x3) in 1.2 seconds. That's a pretty boring ~300KB * 32 ~= 10MB/sec of read bandwidth off the SSD for Imagenet. Even dealing with "real" images is probably only 10x that, which is trivial to get over the PCI bus.

The question would be whether we can turn the crank on the design of models to make it possible to do something really cool given access to very high-speed SSD storage.


I was thinking the same thing, but is SSD to GPU faster than RAM to GPU? In many (not all) cases you buy a tonne of RAM and load your entire dataset into memory once and then iterate over it as necessary.

You also lose the flexibility of doing any sort of data modification or augmentation. One domain where your data usually doesn't fit in RAM is image recognition, but often you want to do things like apply random flips, crops and change hues before training to make the neural net less sensitive to those changes, which you can't really do with this.


SSD is probably not as fast as RAM, but it's much much cheaper, in the order of 10x per gigabyte. With SSD-GPU bridge you can have fast access to a multiple TiB training set, on a single machine.

Data pre-processing is indeed an issue, but hue adjustment/flipping/cropping could be implemented as Tensorflow operations, on the GPU. Similarly with input decompression - it would either have to be done on GPU, or the data would have to be stored uncompressed.


As long as the average bandwidth isn't a bottleneck, it's not going to matter - at worst, you're just going to need to prefetch (and due to SSD latency, that's likely optimal regardless).


RAM-to-GPU is always faster than SSD-to-GPU. It is a solution to help a situation when data size does not fit RAM size (or when user has less budget to purchase enough RAM. In fact, we can purchase Intel SSD 750 (400GB) with 300USD).


For the scenario you're targeting: databases, this makes a tonne of sense, database data regularly exceeds the size of RAM and the operations you want to do on the data are pretty static in the sense that they're the SQL operators.

In deep learning you are usually doing a lot more custom processing and your datasets are usually not as big, such that just buying more RAM is often cost effective.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: