I’m referring to post-training pruning not smaller models. This is already well-studied but it’s not as useful as it could be on current hardware. (Deep learning currently works better with the extra parameters at training time).
Retrieval models (again, lots of published examples: RETRO, etc.) that externalize their data will bring the sizes down by about that order as well.
If you can get GPT-like performance out of a 17B model, you should publish that.