Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
teruakohatu
on Jan 18, 2021
|
parent
|
context
|
favorite
| on:
GPT-Neo – Building a GPT-3-sized model, open sourc...
TensorFlow supports distributed training with a client server model.
newswasboring
on Jan 18, 2021
[–]
Does it also solve the problem of everyone having different hardware?
londons_explore
on Jan 18, 2021
|
parent
[–]
It does.
For most models, your home broadband would be far too slow though.
newswasboring
on Jan 18, 2021
|
root
|
parent
|
next
[–]
Is it because they will have to communicate back errors during training? I forgot that training these models is more of a global task than proteins folding. In that sense this is less parallelizable over the internet.
londons_explore
on Jan 18, 2021
|
root
|
parent
|
next
[–]
Yes, and also activations if your GPU is too small to fit the whole model. The minimum useful bandwidth for that stuff is a few gigabits...
emteycz
on Jan 18, 2021
|
root
|
parent
|
prev
[–]
What about some kind of sharding, parts of the computation that could be executed in isolation for a longer period of time?
Filligree
on Jan 18, 2021
|
root
|
parent
[–]
An ongoing research problem. OpenAI would certainly like being able to use smaller GPUs, instead of having to fit the entire model into one.
jne2356
on Jan 18, 2021
|
root
|
parent
[–]
GPT-3 does not fit in any one GPU that exists at present. It's already spread out across multiple GPUs.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: