Hacker News new | past | comments | ask | show | jobs | submit login

OPT-6.7B is good, but not even close to GPT-3.

If you can get GPT-like performance out of a 17B model, you should publish that.




I’m referring to post-training pruning not smaller models. This is already well-studied but it’s not as useful as it could be on current hardware. (Deep learning currently works better with the extra parameters at training time).

Retrieval models (again, lots of published examples: RETRO, etc.) that externalize their data will bring the sizes down by about that order as well.


I agree that RETRO is cool. I think you might be stretching it a bit with the applicability, but I take your point.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: