> and no one is actually testing what proportions of results those are The idea ...

> and no one is actually testing what proportions of results those are

The idea that models could possibly overfit the training data is hardly a new idea. It's standard practice to test for that. Check section 7 of the PaLM paper for example. https://arxiv.org/abs/2204.02311