See my comment elsewhere on this post. Greg Brockman, head of strategic initiatives at OpenAI, was talking at a round table discussion in Korea a few weeks ago about how they had to start using the quantized (smaller, cheaper) model earlier in 2023. I noticed a switch in March 2023, with GPT-4 performance being severely degraded after that for both English-language tasks as well as code-related tasks (reading and writing).