I think what people were tuning into more so than “how good are the models” was “how quickly are the models getting better?”
The jumps from ~nothing general purpose/consumer-facing to GPT3, 3.5, Copilot, GPT4 all seemed enormous and pretty much back-to-back. Extending that curve points to some pretty extreme destinations (positive or negative), but now the sentiment seems to be that curve was a bit of a mirage.
I intuitively share the view that that curve was a mirage (and a byproduct of years of R&D backlog + OpenAI’s release cadence) but that isn’t coming from any rigorous analysis.
It all depends on the parameters and constraints. If you spend a lot of time defining, providing sufficient context, and clarifications, they work quite well. I think the reality is people's expectations were unreasonably optimistic and now they are unjustifiably pessimistic. The general public seems to oscillate between extremes quite rapidly while not appreciating the nuances.