Instead of the model changing, it’s equally likely that this is a cognitive illusion. A new model is initially mind-blowing and enjoys a halo effect. Over time, this fades and we become frustrated with the limitations that were there all along.
Check out this post from a round table dialogue with Greg Brockman from OpenAI. The GPT models that were in existence / in use in early 2023 were not the performance-degraded quantized versions that are in production now: https://www.reddit.com/r/mlscaling/comments/146rgq2/chatgpt_...
No it's definitely changed a lot. The speedups have been massive (GPT 4 runs faster now than 3.5-turbo did at launch) and they can't be explained with just them rolling out H100s since that's just a 2x inference boost. Some unknown in-house optimization method aside, they've probably quantized the models down to a few bits of precision which increases perplexity quite a bit. They've also continued to RHLF tune to make them more in-line with their guidelines and that process has been shown to decrease overall performance before GPT 4 even launched.
But given the rumored architecture (MoE) it would make complete sense for them to dynamically scale down the number of models used in the mixture during periods of peak load.