Hacker News new | past | comments | ask | show | jobs | submit login

TensorFlow supports distributed training with a client server model.



Does it also solve the problem of everyone having different hardware?


It does.

For most models, your home broadband would be far too slow though.


Is it because they will have to communicate back errors during training? I forgot that training these models is more of a global task than proteins folding. In that sense this is less parallelizable over the internet.


Yes, and also activations if your GPU is too small to fit the whole model. The minimum useful bandwidth for that stuff is a few gigabits...


What about some kind of sharding, parts of the computation that could be executed in isolation for a longer period of time?


An ongoing research problem. OpenAI would certainly like being able to use smaller GPUs, instead of having to fit the entire model into one.


GPT-3 does not fit in any one GPU that exists at present. It's already spread out across multiple GPUs.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: