Relevant comment thread from people describing how much worse GPT-4 has gotten l...

ShamelessC · on July 7, 2023

I have followed many of these types of posts. In every single instance, no one provides even the _simplest_ amount of evidence. No before/after with the same prompt.

OpenAI even has a whole repository specifically for this - GPT-eval. No one uses it.

I'm not saying the theories are wrong. Maybe there is something behind the hunches that so many people seem to have about degradation. But there isn't _any_ proof. None. Whatsoever. And people are taking _internet comments_ as that proof instead? I mean, sure, it's easy to be cynical about companies in this day and age; which is why I would ultimately believe someone if they provided actual evidence. But, again - not a single ounce of proof has been provided in any one of these threads.

Furthermore, the lack of rigor being applied even with the various anecdotes is appalling.

Which version are you talking about? GPT-4 or GPT-3? Are you using the API or the web interface? Are you aware that output is non-deterministic? Are you aware that your own psychological biases will skew your opinions on the matter? One or more of these questions tend to go unanswered.

Just please, show me some robust proof. If you can't because you didn't think to; you _surely_ must realize that many people are building entire businesses on top of this tech and at least _one_ of them is running these types of evaluations. Furthermore, the model is state-of-the-art for research now as well and if you can _prove_ that there is degradation in the model that they are lying about (in a research paper), you will get citations. And yet, there is nothing. Zilch. Nada.