slightly off-topic: We've heard a lot about how expensive (computationally and $...

minimaxir · on July 29, 2020

The 175B GPT-3 model itself is about 300 GB, in FP16 weights.

Server GPUs that consumers can rent capable of running Tensor cores to handle FP16 weights cap out at 16 GB VRAM. (the recently-announced Ampere-based GPUs cap out at 40 GB VRAM)

tl;dr a local version of the 175B model will not happen any time soon, which is why I really really wish OpenAI would release the smaller models.

317070 · on July 30, 2020

I have seen the figure 0.4 kWh for 100 pages of text come about in the paper for generating text with the model. So that goes up to $0.002/page of text when you take into account hardware cost too.

Source: https://arxiv.org/pdf/2005.14165.pdf section 6.3. Or for a more monetary overview: https://www.lesswrong.com/posts/N6vZEnCn6A95Xn39p/are-we-in-...

hhh · on Aug 1, 2020

Cheaper than my workplace pays to print a colour page of paper.

skybrian · on July 29, 2020

Queries seem to be pretty heavyweight because on AI Dungeon there is noticeable lag and sometimes it times out. It will be interesting to see how OpenAI prices queries.

Google has managed to get machine translation to work offline on cell phones [1] so it seems plausible that some very compressed model might eventually work if they made the effort, but it probably wouldn't be the same.

[1] https://support.google.com/translate/answer/6142473?co=GENIE...