OPT-6.7B is good, but not even close to GPT-3. If you can get GPT-like performan...

soraki_soladead · on July 23, 2022

I’m referring to post-training pruning not smaller models. This is already well-studied but it’s not as useful as it could be on current hardware. (Deep learning currently works better with the extra parameters at training time).

Retrieval models (again, lots of published examples: RETRO, etc.) that externalize their data will bring the sizes down by about that order as well.

benreesman · on July 24, 2022

I agree that RETRO is cool. I think you might be stretching it a bit with the applicability, but I take your point.