I'm surprised that this is IEEE worthy and not just common sense. Of course there'll be huge speedups if, and only if, your dataset fits into main RAM and your model fits into the GPU RAM.
But for most state of the art models (think gpt with billions of parameters) that is far from being the case.
Yes. Jukebox model was trained on 512x V100 GPUs for 4 weeks. Try doing that on a $8k workstation.
Not saying it wouldn't be a worthwhile goal to improve the algorithms so that it becomes possible. At least on a 8x V100 machine, for Christ's sake. Because that's all I got.
> At least on a 8x V100 machine, for Christ's sake. Because that's all I got.
Well that's still one powerful supercomputer and allows you to pretrain BERT from scratch in just 33 hours [1].
I mean that's $100,000 in hardware you have at your disposal right there, which is still orders of magnitude beyond 8k-level workstation hardware...
It speaks to the sad affair that is SOTA in ML/AI - only well funded private institutions (like OpenAI) or multinational tech giants can really afford to achieve it .
It's monopolising a technology and papers like this help democratise it again.
Yes, it would be great to see AI training becoming more democratized again, but with its mere ~2x this paper won't help that much, plus the most expensive part in training a novel AI might well be to hire all those people that you need to create a dataset spanning millions of examples.
Training data isn't always an issue. There are plenty of methods that don't require labels or use "weakly labelled" data.
Since most contemporary methods only make sense if lots of training data is available in the first place, many companies interested in trying ML do have plenty of manually labelled data available to them.
Their issue often is that they don't want to (or can't for regulatory reasons) send their data into the public cloud for processing. Any major speed-up is welcome in these scenarios.
But for most state of the art models (think gpt with billions of parameters) that is far from being the case.