TensorFlow supports distributed training with a client server model.

newswasboring · on Jan 18, 2021

Does it also solve the problem of everyone having different hardware?

londons_explore · on Jan 18, 2021

It does.

For most models, your home broadband would be far too slow though.

newswasboring · on Jan 18, 2021

Is it because they will have to communicate back errors during training? I forgot that training these models is more of a global task than proteins folding. In that sense this is less parallelizable over the internet.

londons_explore · on Jan 18, 2021

Yes, and also activations if your GPU is too small to fit the whole model. The minimum useful bandwidth for that stuff is a few gigabits...

emteycz · on Jan 18, 2021

What about some kind of sharding, parts of the computation that could be executed in isolation for a longer period of time?

Filligree · on Jan 18, 2021

An ongoing research problem. OpenAI would certainly like being able to use smaller GPUs, instead of having to fit the entire model into one.

jne2356 · on Jan 18, 2021

GPT-3 does not fit in any one GPU that exists at present. It's already spread out across multiple GPUs.