Nowhere do they define "AGI". I guarantee that is a big reason why the predictio...

jiggywiggy · on June 27, 2023

Chatgpt 4 is amazing. Still cant comprehend they build this.

However if you work with it a lot you realize it doesn't understand reality, it predicts language.

It reminds me of YouTube videos of chess when human start to figure out chess bots and the tricks they use to draw out time.

Understanding language is super impressive, but a far cry from general intelligence. Maybe from here open ai can build further. But I have a hard time believing that this will be a foundation for general intelligence.

A lot of animals have a form of general intelligence but cant do math or language. Yet in some ways are more capable then the latest self driving cars.

jsight · on June 27, 2023

Is it surprising that a system that was exclusively trained on emitted language will only do well on language?

I don't see how I can extrapolate to anything beyond that. I certainly can't extrapolate that it would be unable to learn other tasks.

bg24 · on June 27, 2023

Everything is defined in language. As a system is able to learn from the corpus of texts, it also builds reasoning to a large extent. For example, GPT4 can guide you through solving a problem like building a flying and crawling drone for inspecting an attic. Or, watering plants. So this goes beyond language, unless I missed your point.

jsight · on June 30, 2023

Karpathy explains it far better than I did in his "State of GPT" talk about a month ago: https://www.youtube.com/watch?v=bZQun8Y4L2A

Yes, language can do that, but books and other texts, on average, leave out a lot of steps that humans have learned to do naturally. They can be prompted to do them, and their answers improve, indicates that there is definitely some level of "reasoning" there.

But it has weird limitations in its current forms. I expect that to improve pretty quickly, tbh.

akomtu · on June 27, 2023

Not everything. For example, how did Wagner create his music? The ability to appreciate beauty stands above reasoning. But your point largely stands, as langugage processing alone allows to build a "linear thinking machine" with super human reasoning ability. It won't be able to make music or art, but it will create a spaceship and take science to the next level. As a side effect, such AI will erase all art, as art will look like noise in its "mind".

og_kalu · on June 27, 2023

This is the problem here.

A Claim is made. "GPT isn't general" or "GPT isn't Intelligent" or whatever but a testable definition is not made. That is, what is generality to you and what bar need be passed ? Or What Intelligence is and what competence level need be surpassed ?

Without clearly stating those things, Intelligence may well be anything or any goal and your posts could shift to anywhere.

I'm just going to tell you facts of the state we're in right now.

Any testable definition of AGI that GPT-4 fails would also be failed by a significant chunk of the human population.

dbspin · on June 27, 2023

> Any testable definition of AGI that GPT-4 fails would also be failed by a significant chunk of the human population.

GPT cannot function in any environment on its own. Except in response to a direct instruction it cannot plan, it cannot take action, it cannot learn, it cannot adjust to changing circumstance. It cannot acquire or process energy, it has no intentionality, it has no purpose beyond generating new text. It's not intelligent in any sense, let alone generally. It's an incredibly capable tool.

Here's a testable definition of AGI - any combination of software and hardware that can function independently of human supervision and maintenance, in response to circumstance that have not been preprogrammed.

That's it. Zero trial learning and function. All adult organisms can do it, no AI can. Artificial general intelligence that's actually useful would need a bunch of additional functionality of course, there I'll agree with you.

og_kalu · on June 27, 2023

>GPT cannot function in any environment on its own. Except in response to a direct instruction it cannot plan, it cannot take action, it cannot learn, it cannot adjust to changing circumstance.

Sure it can. It's not default behavior sure but it's fairly trivial to set up, just expensive. Gpt-4 can loop on its "thoughts" and reflect, it can take actions in the real world.

https://tidybot.cs.princeton.edu/ https://arxiv.org/abs/2304.03442 https://arxiv.org/abs/2210.03629 https://arxiv.org/abs/2303.11366

rhn_mk1 · on June 27, 2023

> Here's a testable definition of AGI - any combination of software and hardware that can function independently of human supervision and maintenance, in response to circumstance that have not been preprogrammed.

Not sure how I feel about a significant portion of the population already not meeting that threshold. The elderly? The disabled? I think you're proving the parent's point.

TeMPOraL · on June 27, 2023

> Not sure how I feel about a significant portion of the population already not meeting that threshold. The elderly? The disabled?

Anyone? How did the saying go? The minimum viable unit of reproduction for homo sapiens is a village.

None of us passes the bar, if the test excludes "supervision and maintenance" by other people - and not just our peers, but our parents, and their parents, and their parents, ... all the way back until we reach some self-sufficient-ish animal life form. That's, AFAIR, way below primates on evolutionary scale.

But that test is bad also for other reasons, including but not limited to:

- It is confusing intelligence with survival. Survival is not intelligence, it is a highly likely[0] consequence of it[1].

- It underplays the non-random aspect of it. Phrased like GP phrased it, a rock can pass this test. The power of intelligence isn't in passively enduring novel circumstances - it's in actively navigating the world to get more of what you want. Including changing the world itself.

--

[0] - A process optimizing for just about any goal will find its own survival to be beneficial towards achieving that goal.

[1] - If you extend it from human-like intelligence to general optimization, then all life is an example of this: survival is a consequence of natural selection - the most basic form of optimization, that arises when you get a self-replicating something that replicates with some variability into similar self-replicating somethings, and when that variability affects survival.

welshwelsh · on June 27, 2023

Define "reality."

Human experience consists of more than language, for sure. There is also a visual, audible and and tactile component. But GPT-4 can also take visual inputs from my understanding, and it shouldn't be difficult to add the other senses.

Google's experiments with PaLM-e show that LLMs are also useful in an embodied context, such as helping robots solve problems in real-time.

sublinear · on June 27, 2023

> you will be able to do most human tasks with it. You don't need to invent a lot of other stuff to be general purpose.

I think this is where most people strongly disagree with you. A probabilistic language model is not good enough to do anything requiring context particularly well.

cwkoss · on June 27, 2023

Agentic uses of GPT can solve the context problem by breaking problems into steps and building prompts to solve those steps.

Can't do everything, but GPT+APIs can do a lot.

namaria · on June 28, 2023

What in Chomsk's name is "agentic"?

davidmanheim · on June 27, 2023

> Nowhere do they define "AGI"

Ummm, maybe you should have looked? At the top of the very first prediction, here: https://www.metaculus.com/questions/5121/date-of-artificial-...

We will thus define "an AI system" as a single unified software system that can satisfy the following criteria, all completable by at least some humans.

Able to reliably pass a 2-hour, adversarial Turing test during which the participants can send text, images, and audio files (as is done in ordinary text messaging applications) during the course of their conversation. An 'adversarial' Turing test is one in which the human judges are instructed to ask interesting and difficult questions, designed to advantage human participants, and to successfully unmask the computer as an impostor. A single demonstration of an AI passing such a Turing test, or one that is sufficiently similar, will be sufficient for this condition, so long as the test is well-designed to the estimation of Metaculus Admins.

Has general robotic capabilities, of the type able to autonomously, when equipped with appropriate actuators and when given human-readable instructions, satisfactorily assemble a (or the equivalent of a) circa-2021 Ferrari 312 T4 1:8 scale automobile model. A single demonstration of this ability, or a sufficiently similar demonstration, will be considered sufficient.

High competency at a diverse fields of expertise, as measured by achieving at least 75% accuracy in every task and 90% mean accuracy across all tasks in the Q&A dataset developed by Dan Hendrycks et al..

Able to get top-1 strict accuracy of at least 90.0% on interview-level problems found in the APPS benchmark introduced by Dan Hendrycks, Steven Basart et al. Top-1 accuracy is distinguished, as in the paper, from top-k accuracy in which k outputs from the model are generated, and the best output is selected.

By "unified" we mean that the system is integrated enough that it can, for example, explain its reasoning on a Q&A task, or verbally report its progress and identify objects during model assembly. (This is not really meant to be an additional capability of "introspection" so much as a provision that the system not simply be cobbled together as a set of sub-systems specialized to tasks like the above, but rather a single system applicable to many problems.)

Resolution will come from any of three forms, whichever comes first: (1) direct demonstration of such a system achieving ALL of the above criteria, (2) confident credible statement by its developers that an existing system is able to satisfy these criteria, or (3) judgement by a majority vote in a special committee composed of the question author and two AI experts chosen in good faith by him, for the sole purpose of resolving this question. Resolution date will be the first date at which the system (subsequently judged to satisfy the criteria) and its capabilities are publicly described in a talk, press release, paper, or other report available to the general public.

sdwr · on June 27, 2023

I think we're reaching a point where the Turing test is no longer useful. If you get into the nitty-gritty of it (instead of just handwaving "computer should act like person"), it's about roleplaying a fake identity. Which is a specific skill, not a general test of competence.

nfc · on June 27, 2023

The Turing test seems to be a product of an era where the nature and capabilities of artificial intelligence were still in the realms of the unknown. Because of that it was difficult to conceive a specific test that could measure its abilities. So the test ended up focusing on human intelligence—the most advanced form of intelligence known at that time—as the benchmark for AI.

To illustrate, imagine if an extraterrestrial race created a Turing-style test, with their intelligence serving as the gold standard. Unless their cognitive processes closely mirrored ours, it's doubtful that humans would pass such an examination

dbspin · on June 27, 2023

Thank you. It was arguably never useful beyond an intuition pump. It's a test of credulity, of susceptibility to pareidolia, not reasoning ability.

usaar333 · on June 27, 2023

Correct, which is part of the reason the "weak" AGI is relatively out there. Will anyone bother dumbing down an AI to pass a Turing Test? "Oh a human can't write a poem that fast -- it's an AI!"

ilaksh · on June 27, 2023

Yup, missed that, thanks. Has anyone scored GPT-4 on the APPs benchmark?

I believe that if you take GPT-4 multimodal integrated with Eleven Labs and Whisper then there is a shot at passing that extended Turing test, if designed fairly. The wording is still a bit ambiguous.

Also assembling that particular scale model is probably challenging but not really a general task and something that could be probably be achieved with simulated sensors and effectors given a 3-4 month engineering effort into utilizing advanced techniques (maybe training an existing multimodal LLM and integrating it with some kind of RL-based robot controller?) at interpreting and acting on those kinds of instructions. It would be possible to integrate it with the LLM such that it could report its projects and identify objects during assembly.

So my takeaway is that with some serious attempts and an honest assessment of this bar, an AI would be able to pass that this year or next. I mean I don't know how far GPT-4 is from the 75%/90% but I doubt it is that far and so expect if not GPT-4 then GPT-4.5 or 5 could pass given some engineering effort aimed at the test competencies.

If people really are thinking 2030 or 2040 when they read "AGI" and respond to that poll (I suspect some didn't read the definition) then that would indicate that people are just ignorant of the reality of how far along we are, or in denial. Or a little of both.

gpderetta · on June 27, 2023

You do realize that many, if not most, humans would fail this test, right?

og_kalu · on June 27, 2023

Yes you'll find that any testable definition of AGI that has not been passed yet would be unpassable for a big chunk of the human population.

In other words, General, Artificial and Intelligent have been passed. That's why a few papers/researchers opt to call these models "General Artificial Intelligence" instead

https://jamanetwork.com/journals/jama/article-abstract/28064...

https://arxiv.org/abs/2303.12003

Or some such variant like "General Purpose Technologies" as Open AI did.

https://arxiv.org/abs/2303.10130

since "AGI" has so much baggage with posts shifting at the speed of light.

TheOtherHobbes · on June 27, 2023

AGI is competing with human culture as a whole.

Individual humans are not exactly the best of all possible tests for AGI.

kayodelycaon · on June 27, 2023

Yes, but humans as a group can do it. An AGI needs to show a similar number of AGIs can do the same given the same starting template.

The AGI will need to look at all of the tasks written, determine what the success criteria is, and then combine that that into a single set of answers. With the instructions in human-readable form, not machine readable. It can use as many or as few AGIs as it needs to accomplish this.

It's the same as if we gave these instructions to a human with sufficient skill and resources to delegate.

Aerroon · on June 28, 2023

In my opinion a critical part of an AGI is learning on the fly from few examples. Humans do this well.

When humans play (video)games against an AI they will usually find some pattern of behavior that the AI falls into. Once they find it, they can continuously take advantage of this pattern of behavior and abuse it for great success. I think current LLMs still fall into this category. But I think they might be able to handle this relatively well quite soon.

akomtu · on June 27, 2023

One argument against ChatGPT's AI-ness was that it's wildly unstable: a small change in the input may become a big change in the output. When that's fixed, GPT will still need memory to learn things, as AI that can't learn isn't really AI. However the memory can be added easily: it's just a hidden step after every prompt when GPT is asked to update it's memory state (a blob of text, its scratchpad). If GPT is stable, it will update the memory more or less properly. With these two changes, and they are fairly simple, GPT5 will earn its AI badge. P.S. I find it interesting that eye and AI sound similarly, so perhaps the symbol of the future AI deity will be an eye.