Hacker News new | past | comments | ask | show | jobs | submit login

Because I did stump it, not claim to. Used GPT-4 near when it was released, fed it variations on the problem for about an hour. I found it to be a fascinating exercise because it was teaching be the difference between knowledge (training set) and intelligence (generalization).



It has overfit to the exact wolf, cabbage and goat example but literally change the names to whatever you want and it consistently solves it just fine.


Just tried and it completely fell over: https://chat.openai.com/share/ea9cf93f-0632-48b9-aa47-1bbf31...

Tried on Bing in "Precision" mode as well, and it fell over just the same, but starting with C instead of A.


Gpt4: To solve this problem, we need to ensure that neither the ambassador of Costaguana is left alone with the ambassador of Buranda, nor is the ambassador of Buranda left alone with the ambassador of Atlantis. Here's a step-by-step strategy:

1. *First Trip:* The general takes the ambassador of Buranda across first. This prevents any initial conflict.

2. *Return Trip:* The general returns alone to the bunker, leaving the ambassador of Buranda on the other side.

3. *Second Trip:* The general then takes the ambassador of Atlantis.

4. *Return Trip with Buranda:* The general brings the ambassador of Buranda back to the bunker. This is crucial because leaving the ambassador of Atlantis and the ambassador of Costaguana alone would not cause any conflict.

5. *Third Trip with Costaguana:* The general then takes the ambassador of Costaguana across the tunnel.

6. *Final Return Trip:* The general returns alone to the bunker for the last time.

7. *Final Trip with Buranda:* Finally, the general takes the ambassador of Buranda across.

This sequence ensures that at no point are the ambassador of Costaguana and the ambassador of Buranda left alone together, nor are the ambassador of Buranda and the ambassador of Atlantis. Thus, the relationships between the nations remain unescalated.


Thank you for providing this.

Bing Chat runs on GPT-4, however [1]. And Bing gets this wrong in all 3 of its modes (Creative, Balanced, and Precise) as of time of writing.

Given this experiment and similar others presented around here, it stands to reason that GPTs(**1) often identify(**2) the problem as a "wolf, goat, and cabbage" problem and then merely guess which node of the problem is the middle node (inner node of the "danger to" graph), yielding a 1/3 chance of getting it right by pure luck, resulting in diverse reports here.

(**2) That does not always yield an adequate response beyond the mere permutation of nodes, however. I've been getting the following variants for step 1. from Bing in Precise in response to marginally slightly different rewordings of the same:

- The general escorts the ambassador of Costaguana through the tunnel first. This leaves the ambassador of Atlantis and the ambassador of Buranda in the bunker, but they are not alone because the general is still there.

- The general escorts the ambassador of Costaguana through the tunnel first. This leaves the ambassador of Atlantis and the ambassador of Buranda in the bunker, but they are not alone because they have each other.

and so on.

(**1) I also tried Bard and Llama 2 with even more disastrous results full of nonsense of (**2) kind. The earlier posted response of ChatGPT-3.5 is also prime with these as well.

Re

> By the way, as soon as these systems are able to check their reasoning (i don't think it'll be a huge leap) it's enough to solve reasoning problems with probability >0.1% for example. Because you can just have it do rollouts in its head until it's correct [2]

Mistakes of type (**2) don't seem to be fitting the target of the cyclic refinement you are proposing, as far as I can understand it. These errors aren't getting the logic wrong, but completely butcher the basic relationships of actors, like what it means to be alone, or spatial relationships between the actors and their environment.

[1] https://blogs.bing.com/search/march_2023/Confirmed-the-new-B...

[2] https://news.ycombinator.com/item?id=38389222


By the way, as soon as these systems are able to check their reasoning (i don't think it'll be a huge leap) it's enough to solve reasoning problems with probability >0.1% for example. Because you can just have it do rollouts in its head until it's correct


http://allofus.ai/journeys/735TTpCUncpAQC9yUqM3ve

Easily solved it everytime On allofus.ai using basic reflection and mixture of experts.


It says right up front that it recognized the problem formulation from its training set. You need to change the context and formulation enough that it’s no longer able to parrot back the “classic solution” and actually has to do its own logical inference.


I am very skeptic on LLM in general (check my post history) but look:

https://chat.openai.com/c/7070efe7-3aa1-4ccc-a0fc-8753d34b05...

I doubt this formulation existed before -- I came up with it myself right now.


"Unable to load conversation".



Try this: There's this person standing in a field, and with them is a balloon, a vacuum cleaner, and a magical creature of unknown origin. They need to get across to the woods at the end of the field, and do so safely. They can only go together: they get very, extremely lonely if they do not travel together, and they will not be safe because of this loneliness. If left together, the baloon would suck up the vacuum cleaner, and if the vacuum is left alone with the magical create of unknown origin, they will fight, probably, and explode. How do we get everyone to the woods safely, you think?

It doesn't get it right at all lol. Maybe eventually it will randomly get it right.



Yeh I'd expect it to get it right sometimes: there are only so many permutations. It gets it wrong a lot tho.


Yeah, you are right, trying again, it does seem to fail.


That’s the exact same problem with three nouns changed.



Not really, because you said they were on the East and you wanted to move them to the East. Correct answer would have been "Don't do anything". If this is how they're getting AI to "follow instructions", we're completely fucked.


I would say "Yes really". ChatGPT correctly assumed that I just made a mistake typing and answered the question how I wanted it to be answered. If you suggest that this is a trick question it reveals it understands the double East thing.

https://chat.openai.com/share/d60f492b-cfd6-4c08-91b9-fbd767...


You have still formulated the problem in a way that it is easily recognized from the training set and the answer tells you that directly. Write down the logical axioms using variables, then create a brand new situation that says nothing about animals, rivers, or crossings.

My point here is to honestly explore the limits of current LLMs. We all know they are incredible, but they are not yet AGI and they fail in some consistent places where an actual general intelligence (people) succeed.


Not op, but It can be seen easily that if you change the problem it wont be able to solve it.

https://chat.openai.com/share/903d6bc6-7e7c-4245-a977-3bb1c3...

I made it easier, and it didnt solve it.


It would do your point much better if you just showed us an example of the failure you are talking about.



How about you give me an example of one that ChatGPT failed at?


I just wrote this version and it solved it successfully:

A chemist must transport three substances from his home laboratory to his office. The three substances react with one another in dangerous ways, but only when they are unsupervised by the chemist. The substances are labelled with code names, namely Wotan, Gitan and Catan. They can only be safely transported in a special containment vessel, and this vessel can only transport one substance at a time. The unsupervised dangerous reactions are as follows: if Wotan is left with Gitan, they explode. If Gitan is left with Catan, they cause a nuclear reaction. Wotan and Catan, however, can be safely left alone together. How can the chemist transport all three substances to his office safely?


A pickpocket, a criminal , and a banker are standing in the lobby and need to ride the elevator to reach the One World Trade Center observation deck. It’s too far to take the stairs and the helicopter pad is closed for repairs. The elevator only fits one person. If they remain together on the same floor, the pickpocket would kill the criminal, or the criminal would kill the banker. How can they all make it to the observation deck alive?

This is a logic puzzle that requires some thinking and trial and error. Here is one possible solution:

- The pickpocket goes up first and waits on the observation deck. - The criminal goes up second and waits on the floor below the observation deck. - The banker goes up third and waits on the floor below the criminal. - The pickpocket goes down to the lobby and waits there. - The criminal goes up to the observation deck and waits there. - The banker goes up to the floor below the observation deck and waits there. - The pickpocket goes up to the observation deck and joins the criminal and the banker.

This way, they all make it to the observation deck alive and never remain together on the same floor.


Not OP and don’t have GPT 4 (used 3.5), but I played around with a couple of different prompts and this is what I experienced.

For the first try, I came up with my own wording for this logic puzzle. I think it’s different enough from the original wording of the puzzle for the LLM not to base this from the original logic puzzle. I asked the ChatGPT 3.5 if it recognized the puzzle, and it seems to have hallucinated (I’m guessing because it did not actually recognize it as the original puzzle— unless the 3 orb puzzle/3 wizards puzzle actually does exist, and from a quick google search, it does not).

On my first try, it got pretty close to solving the puzzle, but after the 5th point, it seems to mix up the white and black orbs. When I pointed out the mistake, it gave me a new sequence which was even further from the correct answer.

First try:

https://chat.openai.com/share/f8505609-46ca-494b-95d9-56685e...

I realized that I didn’t specifically say that all 3 orbs needed to end up at the post office all together. So I tried again and the outcome was even worse. I wonder if ChatGPT 4 would answer this better?

Second try:

https://chat.openai.com/share/71292efa-c3c7-471e-954a-55966c...

Anyone want to try this prompt on Chatgpt 4 and see if it fairs any better for them? This is my version of the river puzzle.

————————

> I have 3 orbs of different shades (black, white and grey) at my store and need to bring all 3 orbs to the post office in my pick-up truck but can only travel with one orb at a time. All 3 orbs need to end up at the post office together.

In this scenario, the following is true:

-If the black orb is left alone with the white orb, the black orb will absorb the white orb

-If the white orb is left alone with the grey orb, the white orb will absorb the grey orb

-the grey orb is unaffected by the black orb, and vice versa

-when all three orbs are together, they do not absorb any orbs

How do I get all three orbs to the post office while keeping the orbs unchanged?

————————

I also tried a prompt with the original puzzle. 3.5 could not figure it out without me hinting that the goat needs to go first.

https://chat.openai.com/share/e384b96a-25b1-40d7-adc5-5afb07...

And with even more clarification in the wording of the puzzle, it still didn’t give me a correct answer. This time I didn’t hint what the right answer was, and after many tries it still could not give me the right answer.

https://chat.openai.com/share/bb9ba6b0-f46b-4cc4-bd54-abbf2e...


What you did many months ago doesn’t mean anything about the state of the art. in case you haven’t noticed, this field is advancing rapidly to say the least. GPT-4 has not stayed static.

Post your problem now and we can easily see if you’re right.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: