> ANNs generalise atrociously badly This statement is meaningless without contro...

YeGoblynQueenne · on Dec 10, 2021

>> GPT-3 yields almost human-level generalization ability for some tasks.

That's an extravagant claim. There's no machine learning or other system, or algorithm, or technique, that can approach the ability of humans to "generalise" in any task, no matter how you want to define "generalisation". The models built by ANNs in particular are shallow and over-specialised and have none of the depth or complexity of whatever "models" of the world and the entities in the world that humans build in our heads.

Evaluations that show "superhuman" ability are poorly designed. Machine learning research is following benchmarks and metrics that mean nothing and show nothing, beyond the ability to beat said benchmarks with said metrics, which is then blithely taken to mean "progress" towards the approximation of human intelligence. This then leads to hyperbole like in your comment.

momenti · on Dec 10, 2021

> no matter how you want to define "generalisation"

... and how you want to define "task". For some prompts/"tasks", GPT-3 does generate impressive (more than trivial) outputs that cannot be found on the internet and that are indistinguishable from what a human would respond, so it generalizes in that sense. Maybe human ingenuity and generalization is also just slightly perturbed interpolation? It is very difficult to produce something truly novel, so we are also rather tightly limited by prior experience. Who knows? Also, who cares if submarines swim? Anyhow, it seems 50% of the internet is bikeshedding about definitions.

YeGoblynQueenne · on Dec 10, 2021

Language generation is a very good example of a task that is very hard to evaluate with any degree of objectivity and for which there are no good metrics.

So, suppose you say that a particular bit of text generated by GPT-3 is "indistinguishable from what a human would respond". If I say it isn't, how can we decide who is right in a way that we can both agree on?

And that's all before we try to figure out "generalisation".

kthejoker2 · on Dec 11, 2021

Is this not literally the (generalized version of the) Turing Test?

You simply hand the text to someone and ask to guess if it was produced by a human or not.

Or hand them two (or ten, or ten thousand) texts and ask them to label the human and AI texts without knowing the actual distribution.

YeGoblynQueenne · on Dec 11, 2021

In the Turing Test you have one human judge. I'm asking what happens when two humans disagree about the human-ness of some automatically generated text.