Hacker News new | past | comments | ask | show | jobs | submit login

slightly off-topic:

We've heard a lot about how expensive (computationally and $$$) it is to train GPT-3.

But how expensive is every query? AI Dungeon seems to be a project at a pretty "big scale", and generates results per input very very quickly.

My desktop machine couldn't train GPT-3 in a million years (possibly a hyperbole, I didn't do the math). But could it run a query in a reasonable timeframe? Do I need 128gb ram?

How long until my smartphone hosts the entirety of the pre-trained model, and runs queries on it without any network connection?




The 175B GPT-3 model itself is about 300 GB, in FP16 weights.

Server GPUs that consumers can rent capable of running Tensor cores to handle FP16 weights cap out at 16 GB VRAM. (the recently-announced Ampere-based GPUs cap out at 40 GB VRAM)

tl;dr a local version of the 175B model will not happen any time soon, which is why I really really wish OpenAI would release the smaller models.


I have seen the figure 0.4 kWh for 100 pages of text come about in the paper for generating text with the model. So that goes up to $0.002/page of text when you take into account hardware cost too.

Source: https://arxiv.org/pdf/2005.14165.pdf section 6.3. Or for a more monetary overview: https://www.lesswrong.com/posts/N6vZEnCn6A95Xn39p/are-we-in-...


Cheaper than my workplace pays to print a colour page of paper.


Queries seem to be pretty heavyweight because on AI Dungeon there is noticeable lag and sometimes it times out. It will be interesting to see how OpenAI prices queries.

Google has managed to get machine translation to work offline on cell phones [1] so it seems plausible that some very compressed model might eventually work if they made the effort, but it probably wouldn't be the same.

[1] https://support.google.com/translate/answer/6142473?co=GENIE...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: