I try to base my judgments of what LLMs can and can't do primarily on my study and research in related fields. I haven't been surprised by the capabilities of any LLM yet, including GPT-4.
Is that a serious question? Studying a field for years should make outcomes in that field less surprising, otherwise what have you been doing?
The creators were surprised in the sense of "we got here sooner than expected" but not "we didn't think this would work". Otherwise they wouldn't have been working on it. And there is nothing new in LLMs in years, it's just increasing fidelity by massively increased scale.
To be honest, I've been more surprised by the incompetence of people in evaluating these systems, including journalists, programmers, and others who should be in a position to know better.
> The creators were surprised in the sense of "we got here sooner than expected" but not "we didn't think this would work". Otherwise they wouldn't have been working on it. And there is nothing new in LLMs in years, it's just increasing fidelity by massively increased scale.
This is categorically false. There are papers being published on all the surprising emergent behavior being observed.
I'm paying attention. I think "scale is all you need" is wrong even when it's right. We have a responsibility to not allow the capabilities to outstrip our ability to understand and control. If we don't do our job that will be the real "bitter lesson."
However, ultimately it's a text predictor driven by a PRNG and I stand by my statement; I think the systems are obviously impressive but the unrealistic expectations people have and the anthropomorphization and projection I'm seeing is even more impressive. Let me know when it starts synthesizing new science or math. By then we're in trouble.