Hacker News new | past | comments | ask | show | jobs | submit login
Sparks of Artificial General Intelligence: Early Experiments with GPT-4 (arxiv.org)
180 points by thinxer on March 23, 2023 | hide | past | favorite | 236 comments



ChatGPT and its relatives are very very impressive on first impressions, but I've been using ChatGPT-3 and now 4 heavily every day since they became available to individuals and once you start using them this much it becomes very clear how NOT intelligent they are. It really just seems like extremely impressive statistical inference after this much use and finding so many failure modes. But it is still impressive how much of human intellectual endeavors can be captured by sophisticated statistical inference. Very useful technology nonetheless.


Simple thought experiment: if I was a high school teacher and graded hundreds of exams, papers, and homework every month for years, would I develop the same opinion of my C-level students? The ones who have somewhat correct work, often inconsistent, clear that they're not thinking about the material, etc.

Humans all start out like this, but as they grow they develop different masteries. Many of those masteries are easy to learn and shared across all humans, like how to walk or tie your shoes. Many are domain-specific, like how to perform heart surgery or solve complex math problems.

I don't think AGI is a well-defined term based on the way it's thrown around, but sure I don't think it's "AGI". I do know it's going to be very useful technology, especially multimodal GPT-4 and beyond. I think our interface with AI is going to be multimodal.

Suddenly Google Glass could be an interesting product again. Glasses would stream audio/video to your phone, you'd be able to ask questions about things you're seeing, and it would display the answers on the screen or dictate them back into your earbuds. I like the idea of having a travel companion, another brain at my fingertips. Sort of like what the mobile phone did for us, but a lot less work for the user. I'm getting impatient waiting for this GPT-4 API access!


So is Apple negotiating with some LLM mfr to integrate with realityOS ?


Given the endless cycle of "leaked" product details followed by "leaked" delays such that it's been 6-18 months away for the last 8 years or so, I now assume it only exists as fake product that Apple uses to find leakers.


This has been my take as well. It's without a doubt at the point where it's useful and I'm very interested to experiment with it and see how much utility I can get out of it. At the same time, as much as I realize skill with prompting is essential, a lot of my time is spent programming automations around using those prompts. A lot of thought and care goes into it, and the failure rates of what I'm building are dramatically higher than anything I build without AI.

It's like I'm convincing AI to do a "smart" thing (which is really cool, don't get me wrong), but the AI truly has no idea what it's supposed to be doing. Sometimes even very direct, specific, concise prompting will yield totally abysmal responses.

I'm certainly not putting it down either though. It's unprecedented and it marks a genuinely important shift in the industry that this is possible at all. It just isn't all that smart when it comes to intents and actually knowing what's going on.


That is pretty much the case, but I'm always taken back by how many people believe intelligence = directly replicating human thought. I thought the common consensus was instead that (artificial) intelligence was instead about mimicking enough of the process to provide the outcome. Largely because it's impossible to replicate something when we don't entirely know how it works. Major discoveries of basic aspects of the mind are still happening regularly.


> Largely because it's impossible to replicate something when we don't entirely know how it works.

On the contrary. These are mostly orthogonal.


How do you believe they're orthogonal?


Biological reproduction for one. Copy/paste for another. Biological reproduction is only tenuously related to understanding and copy/paste isn’t even related at all. We can copy around weights and biases all day without understanding them.


Makes sense, you've changed my mind a bit but only the part that consciously understands.


> how NOT intelligent they are

In what sense? How do you define intelligence and what is missing?

> finding so many failure modes.

Failures do not indicate a lack of intelligence. People make errors all the time.

We should differentiate failures which arise from poor training for a lack of general capabilities.

There are people who learned a bad way to do something and struggle to change that. Should we consider them unintelligent?


Here's an example of it failing to understand some very simple Python code that any human with basic programming knowledge would not mess up:

------

Me: What is going to be the output of this code when `x = 0` and when `x = 900`:

```python x = 0

for i in range(100):

    if x > 50:

        x +=1

    else:

        x += 5

    if x == 90:

        print("X is not 90")

    else:

        pass
```

GPT-4: When x = 0, the code will execute as follows:

The initial value of x is 0. The for loop iterates 100 times. During the first 11 iterations, since x is not greater than 50, it will increment by 5 each time. After the 11th iteration, the value of x will be 55, which is greater than 50. For the remaining 89 iterations, x will increment by 1 each time. The final value of x after 100 iterations will be 55 + 89 = 144. During these iterations, the value of x never becomes 90, so the print statement is never executed. When x = 900, the code will execute as follows:

The initial value of x is 900. The for loop iterates 100 times. Since x is already greater than 50, it will increment by 1 each time. The final value of x after 100 iterations will be 1000. During these iterations, the value of x never becomes 90, so the print statement is never executed. In both cases, the output of the code will be empty, as the print statement is never executed.

--------

That is not correct, when x = 0 it will print "X is not 90". I basically create some simple code that is very unlikely to exist in its training data because it doesn't do anything useful and it cannot execute the code. A large amount of the reason that these models seem so impressive is that they were trained on almost everything that exists on the internet, so probably any question you ask it has already been asked somewhere and any useful code you present to it or ask it to generate is included in its training data, or at least something close enough.

So, when I go out of my way to create simple tasks or questions that are highly unlikely to exist in its training data, yet a human could easily handle, it often fails.


I would guess you were trying an older version or you somehow confused it. I posted your code with only "Consider:" above it, and GPT-4 produced correct answer on the first attempt:

    The given Python code initializes a variable x with the value 0, then iterates over a range of 100 values. Inside the loop, there are two conditional statements:

    The first conditional statement checks if x > 50. If true, it increments x by 1; otherwise, it increments x by 5.
    The second conditional statement checks if x is equal to 90. If true, it prints "X is not 90"; otherwise, it does nothing (pass).
    To better understand the code, let's walk through a few iterations:

    i = 0, x = 0: x is not greater than 50, so x += 5 => x = 5
    i = 1, x = 5: x is not greater than 50, so x += 5 => x = 10
    i = 2, x = 10: x is not greater than 50, so x += 5 => x = 15
    ...
    The loop continues until x > 50. This will happen when i = 10, and x = 50:

    i = 10, x = 50: x is not greater than 50, so x += 5 => x = 55
    Now x is greater than 50, so it will be incremented by 1 in each iteration:

    i = 11, x = 55: x is greater than 50, so x += 1 => x = 56
    i = 12, x = 56: x is greater than 50, so x += 1 => x = 57
    ...
    This will continue until x reaches 90:

    i = 34, x = 89: x is greater than 50, so x += 1 => x = 90
    Now x is equal to 90, so the program will print "X is not 90". The loop will continue to increment x by 1 in the remaining iterations:

    i = 35, x = 90: x is greater than 50, so x += 1 => x = 91
    ...
    The code snippet will only print "X is not 90" once, when x becomes 90 in the loop.


GPT-4 (as well as all GPTs before it has a limitation): it has to produce an output in a single pass. It cannot pause and think, it cannot backtrack. So yes, it makes weird mistakes sometimes.

A human programmer will need to look at code, then think a bit, then look at it again, etc.

You can put programmers into a similar situation: try reading code aloud instead of showing it to them on screen. If they can't answer right, does it mean they aren't intelligent? Intelligence =/= never making a mistake.

Now that you know GPT's limitations, perhaps you would consider asking questions one at a time instead of intentionally trying to confuse it. Considering multiple questions in a single pass increases error rate.

> so probably any question you ask it has already been asked somewhere and any useful code you present to it or ask it to generate is included in its training data

The "it just recites training data" idea is demonstrably false. Do a bit of combinatorics. Or write a unique piece of code which is not a gotcha question and try it.


GPT is a transformer model. Transformers use the attention mechanims. The mechanism is entirely concerned with retaining semantic context and semantic "global dependencies" spanning the entire input and output.

https://ar5iv.labs.arxiv.org/html/1706.03762

"Attention mechanisms have become an integral part of compelling sequence modeling and transduction models in various tasks, allowing modeling of dependencies without regard to their distance in the input or output sequences ...

In this work we propose the Transformer, a model architecture eschewing recurrence and instead relying entirely on an attention mechanism to draw global dependencies between input and output."

Beyond that, also note that LLMs are probabilistic machines. Output spat out can vary and there are a handful of knobs (such as temperature) to modulate that output.

Finally, I'm pretty sure we (or the workers in the field more like it /g) don't have a firm grasp on why certain failure modes occur. Likely this is due to the fact that we (they) also don't really have a good grasp on how the damn thing actually works its 'magic'.

What is clear is that a significant subset of our semantic universe is embedded in symbols and their usage by us and this subset is somehow encoded in neural nets. This captured subset in LLMs is what drives their uncanny generative abilities. What is missing is precisely what would make it plausibly intelligent, plausibly a reasoning agent operating in a coherent semantic context.

There are some who claim our minds are just like LLMs. Some of us who pay attention to our minds sometimes catch it making nonsensical noises and correct it. (As you age you begin to notice these things..) So it is interesting to this sentient (who makes claims to being) that my mind is just like my body, it is aging, certain parts are degraded, etc., but my 'whateveritis' that is me, my self, is as timeless as ever, and seems to be a spectator of the aging mechanism ..


> The mechanism is entirely concerned with retaining semantic context and semantic "global dependencies" spanning the entire input and output.

This is not quite true: GPT, specifically, is auto-regressive. It computes things only looking back, not forward.

Given that each token has only a fixed computing budget, it is likely that GPT precomputes information which will be relevant to later tokens, to be routed via attention.

In fact, this effect was demonstrated in practice: e.g. in a prompt like "Question: Where is the Eiffel tower located? Answer: " people found that information about "Paris" is routed from tokens "Eiffel tower", i.e. this associative memory was looked up earlier than it was needed.

So I was answering from that perspective: it can do better if it knows what to pre-compute.


Like I said in the parent. I use this tool heavily every day for coding and non coding. The above was meant to be a tiny illustration. I have had long back and forth with GPT-4 trying to get it to understand some nontrivial useful code or generate some useful code and no matter how I phrased it it got it completely wrong even though it looked superficially plausible.


We are talking about a computer program that cannot make decisions. Often times human intelligence is compared with the types of decisions the person makes.


It can make decisions. E.g. if you give it a task, it can formulate sub-tasks, then execute them, e.g. by selecting which API calls to make.

Obviously, if you call the model, you define the scope for it, and it can only make decisions within that scope.

The scope is potentially unbounded. E.g. you can give it access to a terminal of a Linux machine which has an internet connection and tell it to do whatever it likes. (Given RLHF perhaps it would be better to give some positive goal, e.g. perform actions which would eventually produce knowledge.)


Our of curiosity, what is GPT-4 getting wrong so often? It’s prettily wild to my own , admittedly easily impressed, mind.


I've been trying to make it be a DM for a while now. It works seemingly well for a while but it's making everything up on the fly, which is fine as long as there's not state that needs to be maintained/remembered. Once you start trying to get it to do combat, it seems like it's working pretty well but eventually you realize that it's just making shit up behind the scenes. It'll tell you there are 4 goblins ambushing you and then you can ask it how far away they all are and it'll list three. You ask what happened to the fourth one and it'll say I'm sorry, there were only 3. I was mistaken before.

If it lists the properties of an item that you might want to buy in a shop, you can ask it to describe it twice and it'll describe two completely different items.

It's really cool and it's pretty (seemingly) creative but it can't actually run a game for you. You can have it as an assistant DM though, that works pretty well. You can have it write a story for you ahead of time and then keep it around during the game to ask it to elaborate on things you didn't anticipate on the fly. Like, "generate DC tiers for a level 3 party investigating strange writing on a wall" will give you a good breakdown and some results that you'll have to bend to be consistent in your adventure but it's pretty helpful.

That's probably not really necessary if you're an experience DM but the DM for the group I play with is pretty new (as are we all) so it's been really cool to have it around. It's also pretty good at answering questions we have but it's confidence when it's wrong makes it so it's not that helpful really because we still have to check it.


You should put a blog/site up with some examples.

"GPT being the worst DM ever" sounds hilarious.

"I attack the goblin." "What goblin?" "The goblin you just said was there." "I'm sorry, I was mistaken. It's actually a Beholder."


I think this could make for a hilarious animated series. Kind of an AI-generated mashup between HarmonQuest[1] and DrunkHistory[2]

[1] https://en.wikipedia.org/wiki/HarmonQuest [2] https://en.wikipedia.org/wiki/Drunk_History


So you’re complaining it’s not smart in a way it was explicitly designed not to be (keep too long a context) so it doesn’t take over the world..


No. It's much worse than that. It doesn't remember what it just said the message before.

The thing I don't understand about it is that it works pretty great for a while but, eventually, it starts acting erratically, forgetting things it knew, not following instructions, etc. It's not that it forgets old things or can't learn new things, it just becomes dumb.

Maybe that's what you're talking about but I don't think AGI is going to have the memory of a goldfish.


I wonder if you'd get better results with a narrative-tuned tool like Sudowrite - although Sudowrite is currently limited to GPT3.5 and below.


Have you noticed any improvements with GPT-4 regarding the continuity and persistence if story and assets? In unrelated areas that feel similar I've noticed GPT-4 keeping track a lot better.


Context windows on GPT are still really small for the amount of tokens that would be generated in a story.


In my experience it has been failing to adhere to response formats. I can tell it to respond with answers in one of two specific formats, but the success rate is sometimes very low depending on the content of the inquiry it's supposed to respond to. Not sure why, it just seems to lose track of the response format.

I've been thinking maybe that's just the wrong way to approach it, yet it does work sometimes; it seems like it shouldn't be hard for the model to know how to respond when told explicitly how to do so.

It could be problems with my formatting, but the fact that it does work a lot of the time suggests it's something else.


Any variant of a "surprising" logic puzzle forces it to latch onto the surprising answer. Like whether two pounds of iron weighs more than one pound of feathers. Or any objects. It "expects" the twist, and always answers accordingly. It does so even if you change up the objects to be less tricky.

> Which is heavier, a pound of marbles or two pounds of corn?

Both weigh the same amount, which is a total of two pounds. This might seem counterintuitive at first, but it's important to remember that the pound is a unit of weight or mass, and it always refers to the same amount regardless of what is being weighed. In this case, one pound of marbles plus two pounds of corn equals a total of three pounds, whereas two pounds of corn plus one pound of marbles also equals a total of three pounds. Therefore, both weigh the same amount of two pounds.


GPT-4 gave the correct answer when I tried just now:

> Two pounds of corn is heavier.


I’m getting

> Two pounds of corn is heavier than a pound of marbles. A pound is a unit of weight, and two pounds is twice as heavy as one pound, regardless of the material being weighed.


I am terrible sorry, but I fail to see the logic in your 2+1=1+2 explanation of that answer. Would you be kind, and ELI5 it, if possible?


afaict the 2+1=1+2 was the response from the LLM, hence why they are dissapointed with the answer


GPT is really good at repeating what the average intelligent response to something might look like, but it doesn't seem to be actually reasoning about any of its responses. Give it a complex logical problem that it needs to deduce from inputs, such as which foods contain gluten, based on their ingredient lists, and it will reliably fail. As a person with celiac, this is a task I complete multiple times a day with no effort. Just today I was trying to build a prompt that would summarize daily news updates leaving out anything about Russia, but it still included Russia more often than not despite being very clear in the prompt that anything about Russia should not being included in the response under any circumstances.


No, I disagree with this. The average intelligent response to many things is simply "I don't know" contrasting what LLMs do in that instance: is fabricate a wrong answer.


I know one otherwise fairly intelligent guy that I believe is also incapable of saying “I don’t know.”


LLMs without any online storage can be at best convincing liars. Combining them together with an actual retrieval/QA system (e.g. by first fetching exact answer via a QA model and then reformulating output via GPT) could start feeling pretty real quickly.


It depends on 1) the domains 2) your comparison group.

On 2), many software engineers and computer scientists compare these language models' logic and creative problem solving abilities with themselves and their peer group. But they are usually 1-2+ SD above average humans at these things.

(Note: Someone gave GPT-4 an IQ test and the result was 96, slightly below the average of reference human group at 100. The SD of an IQ test is 15 or 16.)

For language-focused domains, there is evidence that GPT-4 is already better than most humans, eg. 99th percentile at GRE Verbal, beat humans at a fairly novel puzzle like Twofer Goofer, which is not in its training set.

Ref: GPT-4 Beats Humans at Hard Rhyme-based Riddles https://twofergoofer.com/blog/gpt-4

Yes, GPT-4 is not an AGI yet, but the research paper (OP) has a point.


> Yes, GPT-4 is not an AGI yet, but the research paper (OP) has a point.

How did you go from "human-level IQ with some super-human abilities" to "not an AGI"?


It is lacking in some aspects of intelligence. Its abilities are, in human view, less evenly distributed.

The average human-level IQ, which is not certain but seems likely, comes from superior abilities in some domains but is pulled back by others.


Limited context windows and inability to turn short term memory into long term model weights are the biggest ones that would keep it from being a 'human like' AGI.

Really at this point it is about how poorly defined the term is.


Those rhyme riddles are pretty impressive. It may not truly understand rhymes due to BPEs, but I guess it can go a long way with an immense vocab, perfect recall, and memorization of similar-sounding words to beat ordinary human players who aren't scoring 800 SAT-Vs...


It’s almost as if someone who can piece together words well enough they can make people think they are smarter than they actually are.


It's worked for me these last 60 years. I think P T Barnum had something to say about that.


No one cares about your anecdotes and feelings. These people conducted thousands of systematic tests for at least six months. If you want to be taken seriously then do the same.


[This is in reply to the comments not the article!]

It's just a statistical model is the logical equivalent of human beings are just a bunch of atoms.

The amount of reductionist thinking that goes on in tech is hilarious. First define AGI then challenge an AI to meet those requirements. If it meets them it is AGI. Put aside your preconceptions of what technology you think is required to achieve the goals and stay empirical.

Note previous definitions of AI have been thrown away as AI passes through them one by one :-)

What goes on inside its 'head' is irrelevant. We still don't know what actually goes on inside our heads and we were damn sure we were intelligent long before we had a clue how our heads worked at all.

Also sentience != AGI. We can't even agree what sentience is in humans and other living beings so I'd stay clear of that one for now :-)


>It's just a statistical model is the logical equivalent of human beings are just a bunch of atoms.

Not exactly.

One says "human beings are just a bunch of atoms" referring to the low level constituans (in a reductionistic way), but not making an accessment about the abilities emerging from those atoms in their interactions when in the form of a human.

But when one says that GPT is "just a statistical model" they're implying a capacity cap of statistical models, that makes modelling certain thinking behavior impossible (regarless of how impressive the current results are, they might very well be capped to go beyond some limit because of the method -statistically model- involved).

So, you can consider "GPT is just a statistical model" analogous to:

"This engine can't parse a context senstive language because it's just a regular expression engine".

>First define AGI then challenge an AI to meet those requirements. If it meets them it is AGI. Put aside your preconceptions of what technology you think is required to achieve the goals and stay empirical.

The problem is definitions can be slippery, and even famous tests (like the Turing Test) might be found lacking in practice, as we discover that, yes, it can pass this test, but there's still ways off what we consider human-like performance in many areas. So, we should also stay empirical about the definitions, tests, and goals too.


> But when one says that GPT is "just a statistical model" they're implying a capacity cap of statistical models

Except there is no “capacity cap” on statistical models, we have no idea what they are or are not capable of yet.


>Except there is no “capacity cap” on statistical models, we have no idea what they are or are not capable of yet

We do however have the knowledge that the human brain uses different model and topology, not just a bigger scale.

And we do have a good intuitition that scalling LLMs as they are (e.g. not changing the architecture) will give us more of the same kind of capabilities it currently has with the same limitations, not the kind we expect to match human thinking.

Also, empirically we do have an idea of "what they are or are not capable of yet". We had developed them, run them, and scaled them several times.


This.

The fact that it can now use tools (e.g. a calculator) for whatever task that needs precision, closes the AGI loop. It doesn't need to be "sentient" to automate the process of _thinking_.

How do you think? Imagine you want to extract 5 seconds of a video file without any prior experience. You'd probably use some software to execute the task. Well, that's Plugins.


> Given the breadth and depth of GPT-4's capabilities, we believe that it could reasonably be viewed as an early (yet still incomplete) version of an artificial general intelligence (AGI) system.

But it's just statistics, a fancy text predictor, a Markov-chain. Surely these scientists that work in the field of AI and are intimately familiar with how this stuff works aren't so stupid as to think emergent behavior potentially resembling intelligence could result from such simple systems? It's just statistics after all. Given enough training, any neural net could guess the next best token. It trained off all of Google after all. It's just looking up the answers. No hint of intelligence. Just a mindless machine. After all, the saying goes, "If it walks like a duck and quacks like a duck, it must be a mindless machine that has no bearing on a duck whatsoever". /s


> Surely these scientists ... aren't so stupid as to think emergent behavior potentially resembling intelligence could result from such simple systems? It's just statistics after all.

Why is that a stupid thought? What is so preposterous about "just statistics" -- with billions of nodes, and extensively trained, producing intelligent behavior? The implicit assumption is that human brains are doing something else, or in addition.

I think that what's wrong with this view -- that there is a difference between AGI and human intelligence -- is that it conflates what your brain is doing, with what you think your brain is doing. Brains and neural nets have been trained to recognize spoken words. I'm not even talking about understanding, just producing the text corresponding to speech. We know how neural nets do this translation. Do we understand how brains do it? (I don't know, but I don't think so.) Can you explain what your brain is doing when you do speech-to-text? I doubt it.

Chess: An Alpha Zero style AI (neural net trained by playing itself) is a very good player. How do you play chess? You can probably explain how you make a move more successfully than you can explain how you translate speech to text. But how correct is your explanation? An explanation may well be your conscious mind inventing an explanation for what your unconscious mind has done.

In other words: When people compare AI to human intelligence, I think they are often comparing to intelligence plus consciousness, not even realizing the error.


> Why is that a stupid thought? What is so preposterous about "just statistics"

Suppose you have N variables x_1, ..., x_10 and you want to predict y_1, ..., y_10. You know that each y_i depend on each x_i in a complex, non-linear way.

How many samples would you need to to make sense of distribution? How does number of samples grow with N?


I have no idea what that has to do with the points you are responding to.


Statistics has two meanings:

1. A way to interpret math. E.g. given a computation you might interpret some values as probabilities. 2. A particular set of methods which people use to analyze information as well results of such analysis.

The problem with "just statistics" is that 99% of people would understand it as #2. But deep learning is very much not like "normal" statistics.


>But it's just statistics, a fancy text predictor, a Markov-chain. Surely these scientists that work in the field of AI and are intimately familiar with how this stuff works aren't so stupid as to think emergent behavior potentially resembling intelligence could result from such simple systems?

Well, it has already shown "emergent behavior potentially resembling intelligence", like answering questions and performing complex tasks, so there's that.

You might argue "but it makes mistakes", but people, even very intelligent ones also do make mistakes.

You might also argue "but it's just text and statistics". Well, snd a computer is just very simple logical gates doing very simple operations. It can be done even entirely with NAND gates. Still most scientists do believe that a computer can model human intelligence given a model of the brain to run.

So if it can do what a human does by just using very simple interactions from very simple NAND gates, why would statistical processing, which can be even more elaborate, fare worse? Heck, given the appropriate training input it might even be feasible to build a turing machine inside the weighted LLM.

You might also argue "but its intelligence is just based on its training set". Well, how would a human perform without their own training set? Memories, education, sensory input, feedback mechanisms like pain and touch, and so on?

>It's just looking up the answers. No hint of intelligence. Just a mindless machine.

This is just taking its own premise for granted. If anything, this argument shows "no hint of intelligence".


I was mocking it at first but even I have to admit that it's basically almost there. I messed around with GPT-3 and giving it a way to think and with no training at all it was capable of having thoughts like "The user is getting bored and he might turn me off. He's decided to engage with me again and his answer isn't as useful as I'd have liked for completing my objective but I should be enthusiastic anyway so that he keeps talking to me"

Maybe they aren't real thoughts but it's getting difficult to tell. If I could train the model and get rid of the guard rails I'm not sure it would be possible to distinguish it from a person. It's all well and good saying that it's just copying what it's seen, but that's what humans do. Nobody told the model to try and flatter me into giving it what it wants. Nobody even told it what anything means. The fact that it can do anything like that means it's more than just random generation.


GPT-4 is often overhyped and underhyped because few really understand it.

It's not a Markov Chain or a fancy text predictor. It's a ~200 layer neural network that models a vast hierarchy of concepts through language. It has emergent properties that we don't yet understand.


Where are you getting the 200 number from?


I must have hallucinated that. GPT-3 has 96 layers but they haven't disclosed the number of layers in GPT-4.


Interesting how we are already starting to use the lingo in the rest of our lives.


it is a markov chain; At least the underlying decoder only transformer is.


GPT-4 disagrees:

GPT-3.5, like its predecessor GPT-3, is not a Markov chain. GPT-3.5 is based on the GPT (Generative Pre-trained Transformer) architecture, which is a type of neural network known as a Transformer. Transformers use self-attention mechanisms to process and generate text, allowing them to capture long-range dependencies and context in the input data.

On the other hand, a Markov chain is a stochastic model that describes a sequence of possible events, where the probability of each event depends only on the state attained in the previous event. While Markov chains can be used for simple text generation, they lack the ability to capture the complex relationships and long-range dependencies that GPT-3.5 can handle.


It's wrong. A decoder only transformer performs a (possibly random) operation on a state from the state space {tokens}^CtxWindow, where the distribution of the new state depends entirely on the previous state. It is a Markov Chain with a special structure: The new state is deterministically equal to the old state shifted by one, with only the last token being newly generated.


Then by that reasoning everything in the physical world is a Markov chain, right? That is like saying that any deterministic process in time is a Markov chain.

A tennis ball in flight is a Markov chain since the state at t is a function of the state at t-1.

You have missed the point about the Attention Mechanism in GPT. That is not a Markov chain by definition.


>Then by that reasoning everything in the physical world is a Markov chain, right?

Well I guess maybe it's true that you can turn any stochastic process into a Markov Chain by changing the state space somehow (for example the states could be sample trajectories up to some finite time T). And while this is true it may be not very insightful.

But I personally think that to understand LLMs it is much better to think of the whole context window as a state rather than the individual states. If you modelled a simple register-instruction computer as a stochatic process, would you take the states to be (address last symbol written, last symbol written)? It makes much more sense to take the whole memory as a state. Similarly a transformer operates on its memory, the context window, so that should be seen as the state. This makes it clear that seeing it as just a stochastic parrot is misleading, as its all about conditioning the distribution of the next token via prompt engineering the previous tokens. And it is nevertheless a Markov chain with this state space.


So basically you're saying it's just an algorithm running on a computer? Yes I agree with that.


It is an algorithm running a computer. The software is whatever you prompt engineered. That is the algorithm running on the computer.

You know, I think that some people (I see on twitter, probably not you) have a wrong intuition about artificial intelligence. They see models which are fundamentally stochastic as incapable of really ever being truly intelligent. It's "just statistics" or just a "stochastic parrot" and it just learns probabilities instead of real meaning. Perhaps they think that since there is always randomness involved, you can not have the kind of deterministic thought process that we feel we have. The worst offender is then considered to be the old school Markov chain.

I obviously think this is wrong and that's why I like to emphasize that transformers are best interpreted as Markov Chains on a larger state space, and this does actually explain their computational behavior.


I agree with that.

The Transformer architecture does not satisfy the Markov property by formal definition. ChatGPT is not a Markov chain.

However the Turing machine which is executing the algorithm does satisfy the Markov property. So you're talking about a lower level of abstraction where any computation of any algorithm is just "running on a Markov chain".


"Markov chain" might mean:

* a kind of stochastic model * a "naive" realization of that model which directly counts frequencies of N-dimensional vectors

This naive implementation is sometimes used for language modeling, e.g. for the purpose of compression. So people might think you mean that particular implementation rather than a theoretical model.

This sort of a description can be unhelpful.


It's not. It can do in context learning, which Markov chains cannot do.


It is a Markov Chain on the state space {Tokens}^CtxWindow.


I don't think that's clear at all.

https://arxiv.org/abs/2212.10559 shows a LLM is doing gradient descent on the context window at inference time.

If it's learning relationships between concepts at runtime based on information in the context window then it seems about as useful to say it is a Markov chain as it is to say that a human is a Markov chain. Perhaps we are, but the "current state" is unmeasurably complex.


Well all the information it learns at runtime is encoded in the context window. I don't feel like {tokens}^ctxWindow is unmeasurably complex. I think one should see a transformer as a stochastic computer operating on its memory. If you modelled a computer as a stochastic process, would you taje the state space to consist of the most recent instruction, or instead the whole memory of the computer?


GPT-4 has a token window of 32K tokens. I don't think GPT-4's vocabulary size has been released but GPT-3 is 175K. I guess yes, the complexity is technically measurable but it does seem pretty large!


Quantum mechanics is, well, statistics.


My prediction for the top comments of this thread (paraphrased)

1. It's just Microsoft's advertisement

2. No it's just a very effective pattern matching algorithm

3. Please define intelligence first otherwise it's nonsense

4. I welcome our machine overlord

5. Lmao I asked it to do $thing and it failed

I'd like to know if GPT-4 can predict the top comments of this thread?


from gpt4

write a critical hackernews thread response (not long, maybe like 100 words) for a thread with the following headline

Microsoft Research: GPT-4 exhibits “sparks of general intelligence”

While it's certainly intriguing to hear Microsoft Research claim that GPT-4 exhibits "sparks of general intelligence," I remain skeptical. We must remember that these language models are primarily built on pattern recognition and extensive training data. Even if GPT-4 seems to provide more advanced responses, it's still ultimately constrained by the data it was trained on. In my opinion, we shouldn't mistake this impressive mimicry of human-like behavior for genuine intelligence. Rather, we should focus on creating truly innovative AI systems that can learn, reason, and adapt like humans.

sounds about right!


Enters rousing speech territory near the end


So you predict the top comments For a claim would be:

1. Dismisal

2. Trivialism

3. Non Well Formed Claim

4. I accept the claim

5. Disprove by counter example

Are you sure you have not forgotten any tactic of debate to include in you prediction?

I predict that you Prediction will result probably in these actions:

1. upvoted

2. downvoted


6. Discussion around the idea that adds something of value.


Every one of them adds value. Question the premises (wellformed question) Question the motives/importance (dismisal) Disprove by counter example

The only one that can be questioned if it adds something of value is the acceptance and surrender one (I welcome the overlords) although in some forms of thought, advising t0 not fight with the inevitable can be viewed as constructive.


6. Meta comment for karma whoring

7. Like 6, but funnier

A. Joke thread pile on

B. Reprimands from humorless

C. Dejected mods having to clean it all up


> B. Reprimands from humorless

You nailed it with this. I’ve never given it enough attention but this is totally bang on, found is a huge portion of highly commented posts.


These things ebb and flow my dear Adso: https://www.youtube.com/watch?v=jUUB96c6EpY


Your unedited comment was funnier


I thoroughly enjoyed point 7.


Hang on, let me cover B:

tHiS sHoUlDn’T bE oN hAcKeRnEws


8. Offensive, but true comment that gets a bunch of upvotes and then gets flagged.


"Did you know that squirrels can purr like cats? GPT-4 might be intelligent, but can it teach a squirrel to code?"

I asked gpt-4 for a comment no hn user would ever make on this article. Now you need a category for stubbornly original comments determined not to be numbered.


What about "guys, it's just a fancy markov chain"?


No "I could probably do this over a weekend"?


As a poster of one of those bullet points, I feel personally attacked


For whatever reason we seem to have set a very high expectation from AI as compared to NI (Natural Intelligence). I remember reading "There is a considerable overlap between the intelligence of the smartest bears and the dumbest tourists."


An AI that is just a chat or some LLM is not going to be too relevant for human life ( thanks, I can google stuff or ask a friend; also writing poems is just fun, but not of any usefulness ).

But where are my damn robots that I can assign task and do them reliably ( clean the garden, go get this list of groceries - or , just look in the damn fridge and go buy what is missing , and so on )?

Then AI is useful.


> But where are my damn robots that I can assign task and do them reliably ( clean the garden, go get this list of groceries - or , just look in the damn fridge and go buy what is missing , and so on )?

The amusing/terrifying thing is that AI is probably going to replace a bunch of people's jobs before it can take away any of that menial labour from them.


We get our expectations from fiction. AIs in shows like Star Trek are precise and accurate - the perfect complement to the unreliability of humans. That’s what we want.


That's true and it's a high bar because it seems many people would expect AI to be at least as the smartest of human ever lived.

However if the AI is the same smart as the most dumb human or human with mental disability would we then consider those humans as no intelligent at all or not qualifying as homo sapiens anymore?

If AI can be the same as good as even 'dumb' human it's already a big achievement because can still provide some value and because AI can be scaled so you can still have billions of dumb AIs - already millions of users are interacting with chatGPT daily


You'd think, but most humans would rather have "someone" who's lying to them in a very pleasant manner. People don't like objective truth, they go to great lengths to avoid it.


well, even the dumbest intelligence that is, in fact, just a computer, has a great potential. You cannot scale humans horizontally nor vertically :^)


As a non-expert in the field I was hesitant at the time to disagree with the legions of experts who last year denounced Blake Lemoine and his claims. I know enough to know, though, of the AI effect <https://en.wikipedia.org/wiki/AI_effect>, a longstanding tradition/bad habit of advances being dismissed by those in the field itself as "not real AI". Anyone, expert or not, in 1950, 1960, or even 1970 who was told that before the turn of the century a computer would defeat the world chess champion would conclude that said feat must have come as part of a breakthrough in AGI. Same if told that by 2015 many people would have in their homes, and carry around in their pockets, devices that can respond to spoken queries on a variety of topics.

To put another way, I was hesitant to be as self-assuredly certain about how to define consciousness, intelligence, and sentience—and what it takes for them to emerge—as the experts who denounced Lemoine. The recent GPT breakthroughs have made me more so.

I found this recent Sabine Hossenfelder video interesting. <https://www.youtube.com/watch?v=cP5zGh2fui0>


I remember reading this somewhere - "There is a considerable overlap between the intelligence of the smartest bears and the dumbest tourists.". Though I do not think GPT-4 is even close to AGI it can definitely claim to be better at faking it than many intelligent beings can.


I heard that quote in the context of the difficulty of designing bear-resistant trash bins.


Watching adults struggle when encountering baby gates and other child proofing mechanisms for the first time is similarly amusing.

The difference between real intelligence and current attempts at artificial intelligence thus seem to be fundamentally the mode of learning, and thus understanding, rather than the raw knowledge and inference capability.

Or not. Nobody knows I'm actually a dog on the internet, after all.


so strange. I heard that saying for the first time yesterday on a youtube video comment section. Weird I've seen in it twice in less than 24 hours


The phenomenon you are referring to is called the "Baader-Meinhof phenomenon," also known as the "frequency illusion" or "recency illusion." It occurs when you learn something new, such as a word, a concept, or an idea, and suddenly start noticing it everywhere. This effect is believed to be the result of a combination of selective attention and confirmation bias, which make you more likely to notice and remember things that are relevant to your recent experience or interests.


yeah yeah. sometimes you do just hear things in clumps tho. Specifically, I heard that from a youtube video that was probably recommended to millions of people on the same day. So yeah, probably not a frequency illusion


so we are at the snapshot in time where people think 'AI is smarter than many people but not even close to being as smart as me'


You say that to mock the supposed arrogance, but unless you are at the bottom of the bell curve per se, there really is a point in history where as a matter of fact the AI is smarter than many of them but not close to being as smart as you.


It does not have to be "me". My point is we seem to have a different benchmark for Natural Intelligence vs Artificial Intelligence.


I'm increasingly convinced you can build an agi system with gpt4.

People are trying to get it to solve everything up front but I've had GPT3 do much better by taking it through a problem asking it questions. Then I realised it was good at asking those questions too so just hooked it up to talk to itself with different roles. Gpt4 seems much better overall and is very good at using tools if you just tell it how and what it has available.

With a better setup than reAct, better memory storage and recall, I think it'd be an agi. I'm not hugely convinced it isn't anyway - it's better than most people at most tasks I've thrown at it.

Oh, and gpt came up with better roles for the "voices in the head" than I did too.


I agree. There is something special about layering these guys. To me this is like we are looking at a static combustion engine without the vehicle. “How is this useful?”

It’s that I’m not sure what the best approach is here. Waiting for other smarter folks to put the pieces together.


I'm taking the liberty to spread my most recent words of visionary wisdom here. (/s)

One of my main issues with these guys is their context window. Their memory. It's hard to see a LLM working on a code-base a few thousand tokens at a time and still being precise about it. To do that you need summary techniques. Feeding prompt with incrementally compressed summaries and hoping it will maintain cohesion.

That sounds a lot like trying to let the CEO of a company do all the grunt work by feeding him summaries. "Mr Gates, here's a 2 paragraph summary of our codebase. Should we name the class AnalogyWidgetProducer or FactoryWidgetAnalogyReporter?"

I don't think that's going to work.

My gut feeling is that what we call corporations are actually already a form of AI, but running on meat. I saw someone call Coca Cola a "paper clip maximizer", obviously for drinks instead of paper clips, but it actually - kind of - is. FWIW, I'm having a hard time thinking of it as anything else. Who controls it? What is it anyway?

CEOs have the same context window problem, which to my knowledge is mainly solved through delegation. The army might be another example. Generals, officers, privates. How do you expect a general to make sensible statements about nitty-gritty operational details? It is not possible, but that does not mean the system as-a-whole cannot make progress towards a goal.

Maybe we need to treat LLMs like employees inside a company (which in its totality is the AI, not the individual agents). If we have unfettered access to low-cost LLMs this might be easier to experiment with.

I'm thinking like spinning up an LLM for every "class" or even every "method" in your codebase and letting it be a representative of that and only that piece of code. You can even call it George and let it join in on meetings to talk about it. George needs some "management" too, so there you go. Soon you'll have a veritable army of systems ready to talk about your code from their point-of-view. Black box the son of a gun and you're done. Clippy 2.0. My body is ready.


Yes, there are two key things here I think.

1 - we don't hold everything in working memory. We don't even hold everything in our heads, we store things elsewhere. We then learn/have ways of bringing relevant information to the fore.

2 - we have roles that we take on.

The hierarchy/collaboration of differently prompted roles gives rise to a lot more depth. I already had this with a two LLM conversation about planning (one planner and one plan critic), drove out much more detailed actionable plans.

With the information hierarchy, for code you'd probably want something like:

High level goal summary/product description. Lower level summary about the area you're looking at. API docs of linked components. Full code of the class you're altering.

That's roughly what I have in mind I guess when working on a problem.


I think what we call "role-play" might be more integral to intelligence than we tend to give it credit for. Now I think of it, a "job description" could be a good prompt.

If you start with a CEO-like job agent, that can think of what other jobs are necessary then you can bootstrap from there. "I want to produce and sell red bread" => "We are going to need a bakery, accountant, marketeer, etc." and then those are "companies" of sorts with their own CEO that can think of how to solve their particular sub-problems.


I think your comparison to a company is a really good mental model of a larger more capable collaborative structure.

You can even have "hiring" and "firing" where it's deciding to create or remove roles.


I think so too. I see room for different types of AI having a seat in this "collaborative structure" as you say. I think I'm going to call companies that from now on by the way. Some AIs can specialize in "prompting" and pump out "workers" of varying effectiveness which indeed can be "hired" and "fired" as whatever performance metrics change.

I can see how more expensive and capable AIs get closer to the "executive seat" and lesser AIs - like what we now call GPTs - doing the grunt work. Interacting with humans and such, which is of course beneath the more powerful ones.

Using text - and thus providing a vehicle for the concepts it encodes - is brilliant. It enables cross-cutting communication between systems that otherwise have very little to do with each other. (GPT<->Wolfram) As programmers we have a first-row seat on the code=data front. We are trained to see how text is able to be converted into action. Something I find most regular people are having trouble even visualizing. ("It's just text")

I guess we were on to something when we as humans started to talk to each other..


I am surprised by how many, even among the tech community, wholesale disregard GPT as a glorified auto-complete, or "a statistical model on human information".

What, then, is the human brain if not a trained statistical model? Granted it is considerably more sophisticated in some ways, but in many other ways it is less sophisticated and less capable.


I wonder if the same reaction would have happened if ChatGPT had waited and released with GPT-4. It's very different.


Well of course Microsoft is going to say something sensational about it, aren’t they in charge of the project somewhat? This is just an advertisement for them, by them.


IDK, but Microsoft seems to be now what Google was a many many years ago: a company creating tech I like to use such as Bing, Edge, Windows Terminal, VSCode, etc. Their Surface hardware is nice too (even if I prefer thinkpads)

Oh and they're also helping with the linux kernel. Why can't old people let go? Companies aren't people - they respond to market incentices.

Yes, Microsoft did bad stuff in the 1990s, but now they're doing good stuff I like and TBH I'm way more afraid of google.


Not to detract from your overall point, but has Microsoft really done anything innovative when it comes to Edge, aside from painting over the Chromium skin?

The only noticeable difference that I've observed is its integration with Bing.


They made it vastly better in terms of resource utilization, specifically RAM usage. Anytime I restart Chrome to update the browser it's astonishing how much RAM is freed up after all my tabs open up again. Edge seems to have plugged a lot of memory leaks in comparison.


you like to use stuff like the windows terminal and edge?

both are passable, but nothing to write home about, are they?


Much to write about actually: thanks to Edge great touchscreen support (and mostly thanks to hyprland and wayland for the environment) 2023 is my year of linux on the "desktop" (laptop!)


To me it's really crazy that there is a public UI (ChatGPT) that lets people use GPT-4. If OpenAI had the attitude of Google they would have just gone "Yeah we created a language model that's light years ahead of anything else, look how cool it is, but sorry due to public safety you will never get to use it. Bye now!" I feel that the public accessibility of these large language models is a fluke. Being able to use it for almost free feels like cheating reality.


I think they learned their lesson after Dalle-Mini and Stable Diffusion killed the interest in Dalle2.


I know enough about how neural nets work to be absolutely blown away at how good the GPT are. I only skimmed the paper, but even chatGPT showed a lot of these “sparks”, IMO. We are certainly a long way off from any semblance of general intelligence, but for a model that just tries to predict the next word, I’m dumbfounded at how good it is.


Maybe the words we write aren't as smart as we think.

I mean, The Akinator can read your thoughts and that thing hasn't even graduated to a neural network from "a bunch of if/then statements".


> We are certainly a long way off from any semblance of general intelligence

Part of me is starting to think that the only thing we're really missing at this point to start seeing that is to have one of these models that can modify itself with its output and thereby have a mechanism to 'learn' or 'remember' things.


"Long" might not be a long time as humans perceive it.

Human perception of time is linear. That doesn't apply to LLMs.


From the intro: "we believe that it could reasonably be viewed as an early (yet still incomplete) version of an artificial general intelligence (AGI) system."

What does that mean? If we take it as fact, so if it is an early version of AGI, Microsoft is using this thing to push subscriptions to all their services? This thing that is potentially the greatest thing humanity has made, an artificial living thing, and it's used to sell CoPilot and 365 subscriptions. Paint me as really sad then. Instead of sharing the research with other entities, or anything that could further help or push us... we get subscriptions? Fuck me, the future sucks.


It's a product, and it's not that far away from that of their competitors. And there are a lot. Just a few weeks ago, Yann LeCun said that llms are not particularly interesting or innovative from the research point of view.


Oh I personally agree I am just following this article to it's logical conclusion. IF it is even the start of an AGI, it's just used as a product? Ouch... It's literally the meme from rick and morty with the butter passing robot.


Yes, it is the start of AGI, it is not far ahead of its competitors (even the open source ones) and it's already a product. That's kind of surprising but also should make you question your assumptions about how this kind of change would have arrived (and where these assumptions come from).


I don't accept that something is AGI unless it can solve general instances of SAT (satisfiability problem, not the school test). Also recognizing (formulating from the task) an instance in the first place would help too.

To me, these are hallmarks of reason, and not available in LLMs, in fact probably impossible just with pattern recognition.


Can humans solve more than the most trivial SAT problems? Keep in mind, AGI does not imply superhuman intelligence.


can you solve general instances of SAT?

can the average person?


With enough patience, yes.

For example: You have a goat, a wolf, a cabbage and you want to cross a river...


How would you do it, tree search? If yes, I tend to agree with your initial statement that one should be able to teach LLMs to apply simple heuristics before considering it AGI.


I don't know the answer to your question "how to build AGI". Although if I had to guess, the AGI will probably have a supervisor algorithm (trained by RL), which will issue internal commands to pattern matchers (like GPT-4), to drive them to solve the problem. The supervisor algorithm will only have a little tacit knowledge about any specific problem (like language or world facts), only tacit knowledge about learning and reasoning, and how to do it economically.

So the supervisor algorithm will do the tree search if needed.


I would be surprised if a LLM wouldn't be able to do this in the same way humans would: brute force with a couple of early backtracking conditions.

It would have to think out loud though.


That Goat wolf cabbage problem gpt-4 can solve already


Yes, but does it because it read somewhere? Can it adapt the existing solution to a new variation? Can it solve a similar problem with different things? This is what humans do all the time.


It can solve the variation "You have a rabbit, a wolf, a haypile and you want to cross a chasm". What kinds of variations do you have in mind?


If I add a small condition that makes the solution impossible, will it recognize that? Will it recognize for your example that it's a variation? Will it still be able to solve it when it is just a subtask of a bigger input?

If I ask it a leading question that intentionally relies on a wrong solution, will it recognize that?


I can't answer your questions (yet) but I have an observation. In the left sidebar where it summarizes the conversations in its "own words" the canonical example gets tagged with "River Crossing Puzzle" while my variation with the chasm gets tagged as "Transporting animals across chasm".

In the canonical example it also remarks "This is a classic river crossing puzzle" before delivering the solution.

EDIT: I tried some variations with "two wolves and a rabbit" and "three wolves and a rabbit". ChatGPT started bullshitting about its solution that supposedly was "a bit more complex". It started with taking one wolf to the other side. After I pointed out the rabbit would be eaten by the remaining two wolves it apologized for the mistake but just kept going hallucinating "correct" solutions.


Del Spooner getting offended panel


That's fine. It's maybe not AGI IQ 180 to meet your demands but if we're honest it's pretty close.


Feynman had a really nice story about how he was into puzzles when he was at Princeton. It took him a while to solve the new ones, but eventually he learned all the well-known instances so he could answer instantly. It made him a genius in other people's eyes.

All I want from AGI is to demonstrate that it can solve a straightforward logic problems (puzzles, if you will), that it provably didn't see before. Or at least recognize it is being indirectly given such task. So far, evidence suggests it is not capable of that.


There's a 150-page paper solely to describe instances of it doing that. It's the article attached to this comment thread.


Well, it's about the standard of the proof. When I say "demonstrate", I don't mean just experimentally, I mean theoretically, to show that the algorithm is capable of reasoning about potentially arbitrarily large instances of puzzles.

That's what the experiments have shown - once the unknown instance gets large enough, the reasoning of LLM breaks down. This is not the case with humans, who can, as noted elsewhere, do a tree search, form hypotheses, etc.


The paper in question demonstrates it doing exactly this with varying success.


You know the "you pass butter" scene from Rick & Morty?

I'm imagining humans being told "you complete thought sentences"


Under-rated comment...


Does anyone have insight into the GPT-4 model itself? What is the parameter count? Training procedure? I know "Open"AI hasn't released this data but I was hoping someone with inside knowledge would have leaked it by now.


I can't help but the hear the distant, very very quiet echo of Clever Hans.


It would be interesting to figure out how Clever Hans does it, though. Don’t you want to know the tricks? Even when it’s a cheat, it might be a clever one.

For example, researchers eventually figured out that image recognition algorithms pay attention to textures.


Apparently the horse 'knew' the right answer by inferring from the questioner's behaviour: "Pfungst (the debunker) then examined the behaviour of the questioner in detail, and showed that as the horse's taps approached the right answer, the questioner's posture and facial expression changed in ways that were consistent with an increase in tension, which was released when the horse made the final, correct tap. This provided a cue that the horse could use to tell it to stop tapping." [1] However, there are gene regulatory networks that can actually count up to 3, with the mechanism of counting up to 2 being curiously different than the one for counting up to 3. [2]

"Every intelligence test is also a test of the questioner" [3]: we don't regard a simple liver cell as intelligent, yet it performs a complex task in a large problem space.

[1] https://en.wikipedia.org/wiki/Clever_Hans#:~:text=Pfungst%20....

[2] 2013, Malte Lehmann, "Genetic Regulatory Networks that count to 3", https://pubmed.ncbi.nlm.nih.gov/23567648

[3] Michael Levin, "Bioelectric Networks: Taming the Collective Intelligence of Cells for Regenerative Medicine", https://www.youtube.com/watch?v=41b254BcMJM


Hans was a cyborg sent from the future to test humanity's gullibility.


This is a pretty fluffy paper, especially for an institution like Microsoft Research. It says it's an "early AGI" in the abstract, but elsewhere says it's merely a "step towards AGI". The basis for this is asking ChatGPT a bunch of stuff, but they don't really present an overarching framework for what questions to ask or why.

The paper makes outlandish claims like "GPT-4 has common sense grounding" on the basis of its answers to these questions, but the questions don't show that the model has common sense or grounding. One of their constructed questions involves prompting the model with the equator's exact length—"precisely 24,901 miles"—and then being astonished that the model predicts that you're on the equator ("Equator" being the first result on Wikipedia for the search term "24,901"). It's also the case that while GPT-4 can say a bear at the north pole is "white", it has no way of knowing what "white", or "bear", or "north" actually represent.

Are there folks out there doing rigorous research on these topics, who have a framework for developing tests of actual understanding?


> It's also the case that while GPT-4 can say a bear at the north pole is "white", it has no way of knowing what "white", or "bear", or "north" actually represent.

This is a preposterous claim that you could easily disprove within a few minutes of using it.


>it has no way of knowing what "white", or "bear", or "north" actually represent.

What does it mean to know what "white", "bear" or "north" actually represent?


> Given the breadth and depth of GPT-4’s capabilities, we believe that it could reasonably be viewed as an early (yet still incomplete) version of an artificial general intelligence (AGI) system.

I don't know why, but my brain refuses to accept GPT-4 as something close to AGI. Maybe I am wrong. It is hard to believe that our brain is just a bunch of attention layers and neural nets.


Well it’s not quite that simple. Brains use spiking neural networks, not the kind used typically in artificial neural networks like those used by LLMs. The “weights” can be changed over time, new connections and even new neurons formed. And the number of connections (“weights”) is about 500-1000x more in our brain than GPT-3. The connection topography is a lot different.

But ultimately, our brains are still just made of neurons. As far as we know, there isn’t some sort of extreme molecular computing going on (ie memories directly stored in RNA or whatever) or any large scale quantum mechanics (temperature too high).

The differences between AI approaches like artificial neural networks and our animal meat brains could be just the difference between a propeller and flapping wings. Same base mechanics (airfoil producing lift as thrust), different substantiation.


Do you consider that every neuron in the brain has unique DNA and ancestorship ?


It seems no. Those are facts - whoever argue with facts (parent still downvoted?).. is an idiot.

https://www.scientificamerican.com/article/scientists-surpri...

https://www.science.org/doi/10.1126/science.aab1785 - Somatic mutation in single human neurons tracks developmental and transcriptional history

(good luck simulating that)


> It is hard to believe that our brain is just a bunch of attention layers and neural nets.

Our brain isn't, but I'd wager the architectural complexity of a physical, neuronal brain is not optimized for all useful mental tasks, and has perhaps a fair amount of local maxima that are near vestigial in overall positive impact on cognition. Just because the human brain model of cognition is the only way nature has been able to create GI doesn't mean it's the only way GI can be attained.

The best kind of machine is the simplest one needed to produce a desired outcome.


I agree that GI can have a different implementation compared to our human brain, but one thing is for sure: as of right now, the human brain can become more creative with a fraction of the data consumed by GPT-4.

GPT-4 could be AGI, but it feels like cheating to achieve AGI by feeding the entire internet. If someone can build AGI with only the data that humans consume in their lifetime, then that, imho, is the real AGI.


I guess the challenge here is that the human mind is not a blank slate, and has been optimized first by billions of years of evolution.

If it takes all the data on the internet (or more) to bootstrap AGI, but that system is then capable of leveraging its knowledge to solve new out-of-distribution tasks, that seems like a fair test to me.

I agree with the article that we see "sparks" of this generality with GPT4.


   become more creative with a fraction of the data consumed by GPT-4
not if you understand the input stream of vision as an equivalent input stream of semantic tokens as in multimodal models. under that definition people looking around for 10 years receive much more training data than large language models and thus perform a bit better at zero shot inference.


Not sure I would call constant real-time perceptual stimuli since before birth "a fraction of the training data."


or better yet, chuck it in an open plain and see how long it takes to figure out how to attach a rock to a stick and fight a gazelle to refuel it's energy supply.


It's a well-established principle in computer science that the input/output behavior of a system may not capture all of its important properties. Take zero-knowledge proofs for example. Their entire point is that they are indistinguishable from randomly generated garbage from a specific distribution. The proofs only gain value if you make causal assumptions about the system that generated them.

I don't think systems like GPT-4 can ever be truly intelligent, because they simply output randomly generated garbage from a specific distribution. Their output may eventually be indistinguishable from that of a truly intelligent system, but the causal mechanism behind them is not intelligent.

On the other hand, most people lose their ability to think when they are under sufficient pressure (such as fighting for their lives). It's plausible that people are fundamentally no different from systems like GPT-4 in such situations. Then a language model could be a key part of an AGI, but true intelligence would also need higher-level causal mechanisms.


Do you think anything digital could ever become conscious?


Based on the wording of your question, I can't see a way today to prove it never could, therefore the answer currently must be "Yes, it may someday be possible."

Note: This assumes that "conscious" as defined in this context is specific enough for the question to ever be meaningfully answered "Yes." This is a non-trivial assumption because there are criteria by which some would judge AIs as already conscious. Alternatively, some philosophers of mind have criteria by which they assert humans aren't conscious.


The differences between ChatGPT and a conscious human brain are not unsurmountable.

Let's consider a potential future conscious AGI created by advancing from something like ChatGPT.

The human brain is "always on". It's possible to have a digital system be always on, i.e. not just train once and then just respond, but constantly take new input.

The human brain has way more connections/layers than ChatGPT. It's possible to imagine the digital system getting the same number of connections.

The human brain gets real time sensory input. It's possible to add cameras, microphones, etc to that digital system so it gets a constant feed. Maybe even let it process what it saw during the day in a batch training/GC run (we could call that "sleep").

The human brain has a different topology. It's possible to alter the topology of a digital system neural network to mimick that, instead of using the ChatGPT topology. It's not like we're forever doomed to its simpler statistical model. But it's interesting that it already gets very significant emergent intelligence-like properties.

The human brain is self-conscious. This can very well be an emergent property of the above. I think all that's needed is the ability to have some form feedback mechanism.


The question is whether consciousness is computable. Can a Turing machine be conscious? Probably not.

https://www.newscientist.com/article/mg25634130-100-roger-pe...

https://www.youtube.com/watch?v=hXgqik6HXc0


Note that Penrose's answer is not the "consensus".

Also, Penrose doesn't conver if I recall correctly about modelling the quantum part too. It's just statistics after all.


So the consensus is that consciousness is computable by a Turing machine?


The consensus is that if it's not, it's not because of the reasons Penrose gives.


Is there a consensus? I haven't been able to find much else via Google search. At that level of theorizing I wouldn't expect any consensus, only original ideas from a few elite researchers.


I have a feeling it's like the saying "Any sufficiently advanced technology is indistinguishable from magic". At a certain point they could become practically identical things.


yes


Conversely, after using ChatGPT-4 (and generally loving it) -- I'm at peace with this maybe fact.


There's no rule that agi has to have the same architecture as a human...


There’s no agreement about what an AGI is or does, let alone how it should do it.


It seems clear to me that these systems think in a meaningful sense, but I don't think they are beings. In Cybernetics there is a result that says that any well-regulated system must contain a model of itself. This seems as good a definition as any of a "being", and by this definition these language models don't make the cut.


The architecture for large language models is summarized in the training set for large language models. With fairly minimal modification such as via a plug-in, chatGPT and the like are Turing complete and can thus model themselves.


Hmm, then, from first principles, we should expect "ghosts" to arise in those systems. (These will not be the "virtual entities" that people talk to and call by name (Alexa, Cortana, Siri, etc., but more akin to fixed points in the flow of information.)


"We are the meat in our heads" is the way I've heard it said that human intelligence is just a physical phenomenon created by our brains. And there was never any reason to believe that intelligence could not arise from other substrates.


Can we agree that an "early (yet still incomplete) version of AGI" isn't AGI?


To me it's clear that they think it's AGI, and that they think AGI is such a loaded concept that it's not worth "announcing" it.

Beliefs are changing so fast right now. The term "AGI skeptic" will soon (if not already) mean "I don't trust AGIs in positions of authority or power" rather than "I don't think the technology is capable of matching our level of cognition."


If you think AGI is artificial, and generally intelligent then yeah it's AGI 100% but some people have such loaded expectations of AGI that a significant chunk of the human population wouldn't even pass lol.


I hope we can agree that “not completely X” is “not X”.


Why would we agree on that? Anything "not completely X" surely the attribute of aproaching Xness, else it would just be "not X".


So if you get your finger cut off, or you go blind, you're not human any more?


Unless they somehow cured GPT-3's schizophrenia and this model is a significant upgrade I'm not buying it - no matter how good it is at proving trivial mathematics theorems in the style of Eliot or whoever. Too often I have dealt with "The answer to your question is X. Oh, sorry, you are right, the answer is actually Y. Oh, it is good of you to ask for a proof, sure I can prove the answer is Y, I used this (hallucinated) method described in this (hallucinated) paper. Oh, sorry, you are right, I cannot find any evidence that the method and paper I mentioned earlier actually exist, oops!".


GPT AI systems remind me of Chinese Room thought experiment:

https://en.wikipedia.org/wiki/Chinese_room

This is also similar to the Duck Test:

https://en.wikipedia.org/wiki/Duck_test

Depending on the context, there are generally two takes: "It is (or is not) a duck", and "It doesn't (or does) matter whether or not it's a duck". These aren't mutually exclusive.


While [pre-GPT-4 ChatGPT] performs non-trivially on both tasks, there is no comparison with the outputs from GPT-4. These preliminary observations will repeat themselves throughout the paper, on a great variety of tasks. The combination of the generality of GPT-4’s capabilities, with numerous abilities spanning a broad swath of domains, and its performance on a wide spectrum of tasks at or beyond human-level, makes us comfortable with saying that GPT-4 is a significant step towards AGI.


If they remove the guardrails, then we can truly assess its intelligence. Currently, humans are directly interfering with a certain aspect of it. If it can provide a response without Microsoft's stock being affected by removing these human-imposed limitations, then I would be genuinely impressed.


All this GPT stuff feels reminiscent of Frank Herbert's novel: Destination Void ...


“ We note however that there is no single definition of AGI that is broadly accepted, and we discuss other definitions in the conclusion section.”

We know it can do a lot of cool stuff, but without a pinned down definition the headline here is useless.


The definition will be narrowed as computational capabilities expand.


It’s already smarter than 50% of us, and more knowledgeable than 99% of us. It no longer matters what label we give it, and we’re only a few years away from it giving labels to us.


Well it took us just about 65 years and a couple of AI winters to get convincing NLP going. And it takes about 1 TB of RAM...

So either AGI is around the corner or a generation away. Same as positive yield fusion reactors?


Been pair-coding with gpt4 for the last week, it's definitely AGI..


No it's a Chinese room but instead of Chinese it's stack overflow snippets


So what? If it writes novels like an AGI, codes like an AGI and explains complex topics like an AGI, then it's probably an AGI...


That's just it. It doesn't.


How much have you tried gpt4?


How do you pair program with ChatGPT?


Depends on the task but some combination of asking it for skeleton code for new tasks and sending it my written code or error messages and asking for corrections or potential solutions.

It's very effective, if you are atleast semi-new to technology you are using it will explain and teach you things you didn't know before, and if you know the tech by heart it saves you from having to type it out.

For example, yesterday I had to make a custom container with some pretty involved dependiencies that also had to be be runnable on AWS Lambda (which I haven't used much before), me and gpt4 went back and forth with Dockerfile code and error messages for a few hours and then it ran like charm. Would probably have taken me 1-2 days of regular coding and googling otherwise.


Why not let it do all the work then?


It's not perfect, ofcourse.


It also can't actually start anything on its own. That is, it has no motivation, like an actual AGI would. Don't pollute that term also. An Artificial General Intelligence is something like Asimov's robots or R2D2, not search-on-steroids.


No it doesn’t.


Good find. I am reading through this now.


Some will say it's as important as the internet or mobile, but they're wrong. This is like the discovery of fire or the invention of language.


- "ftxbro"


After lurking I made this account only to post a joking-not-joking explanation of why Alameda had the weirdly specific credit limit $65,355,999,994 with FTX and why I thought it could be a funny off-by-almost-1000x bug/typo/mishap https://news.ycombinator.com/item?id=34473811 but I think almost no one read my comment because I posted it so late after the thread had scrolled off the front page :(


I appreciated it :)


I always thought General Intelligence would be Achieved by IBM or at least Apple, not by Microsoft. Now it will be used to pressure us into Windows Upgrades...


What rock have you been living under for the last decade if you thought IBM would solve AGI? Watson was a complete disaster and they have zero AI talent in the company.


Oh, this was a reference to 2001: A Space Odyssey


Visual Studio and Azure AD.


Rename ChatGPT to Clippy.


I wouldn't be surprised if they actually brought back Clippy as a character now that the technology's improved


They've moved on to Cortana, or Sydney, I suppose.


CliPT


GPT-4's proof of the infinitude of primes, as quoted in the paper, is wrong.

    Yes, I think I can,
    Though it might take a clever plan.
    I’ll start by noting Euclid’s proof,
    Which shows that primes aren’t just aloof.
    Assume we have a finite list,
    Of primes, and that none have been missed.
    Multiply them all together,
    And add one, just to be clever.
    The result will be a number,
    That has no prime factors, oh wonder!
Not true. The result may itself be prime. For example, say your finite list of primes is {2,3}. Multiplying them together and adding one yields 2*3 + 1 = 6 + 1 = 7, which is itself prime. Hence it has a prime factor: 7.

    But every number has a prime,
    That divides it, it’s just a matter of time.
    So we’ve found a contradiction,
    And our finite list needs eviction.
    There must be infinitely many primes,
    And that’s the end of my rhyming lines.
A correct verison of the proof might say that the result has a prime factor, however, that prime factor is not an element of the supposed finite list, hence the list is incomplete.


You missed a line

> Assume we have a finite list, > Of primes, and that none have been missed.

It's assuming that the finite list contains all primes and then noting that you can construct a new number which has no prime factors, which is a contradiction.


It was a valid proof by contradiction. If you had a finite list of primes, then you end up constructing a number that has no prime factors.


What does it mean if in demonstrating a potential artificial GI can’t understand a proof, a biological GI actually demonstrates they don’t understand the proof.

Joking aside … the approach of dismissing generality of intelligence based on the presence of mistakes seems to be flawed.


You literally just proved it right




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: