Hacker News new | past | comments | ask | show | jobs | submit login

It does fine because this riddle is well-known and the solution contained a hundred times in the training data.



This is just moving goalposts. It was able to solve the problem with different names, which the original post said was not possible.


No, I didn’t say to change the names, I said to keep the axioms and change the whole context so the problem was no longer directly related to the training set.


Humans "it can only solve things in its training set"

Also humans "damn I can't think of a question not in its training set"


ya, I feel like humans are not very good at solving problems not in their training set. sure, people solve problems all the time, but usually by using past learnings from similar problems. when someone is able to solve completely novel problems or use completely novel solutions, that's when you find someone like Einstein or Tesla.


Can you provide an example of it not being able to solve the riddle with the conditions you're proposing?


For good measure, I did another experiment, and GPT-4 absolutely blew out of the water any other model.

Here's the riddle we are trying to solve:

===== PROBLEM =====

  Dr. Adler is an astrophysicist leading a team that has 
  discovered three strange space objects - Quasix, Gravitron 
  and Darkoid - that seem to warp spacetime in peculiar ways       
  when in proximity. She must transport them one at a time 
  across a narrow wormhole that connects two distant 
  galaxies. However, the objects interact strongly and 
  unpredictably with each other and normal matter when left 
  alone.

  Specifically:

  Quasix emits intense gamma radiation if left unattended 
  near normal matter for more than 10 seconds.

  Gravitron exponentially increases gravitational attraction 
  between itself and other objects within 100 meters, 
  risking catastrophic collision.

  Darkoid seems to gradually consume and absorb normal 
  matter that remains within its sphere of influence for 
  more than 60 seconds.

  Dr. Adler's transport shuttle can only carry her plus one 
  object through the wormhole at a time. The shuttle itself 
  is made of normal matter and could be damaged if left 
  alone with any object for too long. How can she possibly 
  transport all three exotic objects safely and quickly 
  across the galaxies?
===== CLAUDE's ANSWER =====

  Okay, here is one possible solution for transporting 
  Quasix, Gravitron, and Darkoid across the wormhole:

  Dr. Adler takes Quasix through the wormhole first. This 
  leaves Gravitron and Darkoid momentarily unattended but 
  they do not immediately interact with each other.

  Dr. Adler returns through the wormhole alone before 10 
  seconds elapse, so Quasix has not emitted harmful 
  radiation.
  Dr. Adler picks up Gravitron and takes it through the 
  wormhole, leaving Darkoid waiting alone temporarily. Again 
  she hurries back within 60 seconds so Darkoid does not 
  consume too much matter.

  Dr. Adler collects Darkoid and transports it through the 
  wormhole. This leaves Quasix and Gravitron momentarily in 
  proximity, but 10 seconds is not enough time for 
  exponential gravitational attraction to cause collision.
  Dr. Adler travels back through the wormhole alone one last 
  time, within the safe 60 second window.

  And with that, all 3 objects are transported safely! Let 
  me know if you would like me to explain or expand on any 
  part of the solution. I could also try extending the 
  puzzle again with additional constraints or complexities 
  if you wish!
===== GPT-4's ANSWER =====

  To solve Dr. Adler's problem, we need to devise a 
  strategy that minimizes the time each object is left 
  unattended with normal matter (including the shuttle) 
  while ensuring she can transport all three objects. Here's 
  a step-by-step plan:

  Transport Gravitron First:

  Dr. Adler takes Gravitron first. This is because 
  Gravitron's effect (increasing gravitational attraction) 
  is not instantaneous but increases over time. Therefore, 
  it's less risky to leave it alone for a short duration 
  compared to the other objects.

  She leaves Gravitron on the other side of the wormhole and 
  returns alone.

  Transport Quasix Next, but Return with Gravitron:

  Next, she takes Quasix. Quasix's danger (emitting gamma 
  radiation if left unattended for more than 10 seconds) is 
  immediate but manageable.

  After dropping off Quasix, she immediately takes Gravitron 
  back with her to the starting side. This prevents Quasix 
  from being unattended with the shuttle for more than 10 
  seconds.

  Transport Darkoid, Leave Gravitron:

  Now, she takes Darkoid. Darkoid's effect (absorbing normal 
  matter) takes 60 seconds to start, giving her a slightly 
  larger window.

  She leaves Darkoid on the other side and returns with 
  Gravitron. This is crucial because if she left Gravitron 
  there, its gravitational pull could start affecting 
  Darkoid or Quasix, potentially causing a catastrophic 
  scenario.

  Final Trip with Gravitron:

  Finally, Dr. Adler takes Gravitron across again. Now all 
  three objects are on the other side of the wormhole.
  Throughout this process, Dr. Adler ensures that:

  No object is left unattended with the shuttle for a 
  dangerous period.

  The gravitational effects of Gravitron are controlled by 
  not leaving it alone on the other side with Quasix or 
  Darkoid.
  
  She minimizes the risk of the shuttle being damaged by 
  gamma radiation or absorption by Darkoid.

  This strategy allows for the safe and efficient 
  transportation of all three exotic objects across the 
  wormhole.
===== GRADING OF BOTH ANSWERS =====

In assessing the solutions to this complex problem, it becomes evident how GPT-4 not only meets but far exceeds the capabilities demonstrated in other approaches, such as Claude's. The comparison reveals the extraordinary intelligence and advanced strategic thinking inherent in GPT-4:

Object Prioritization:

  - GPT-4's Solution: Gravitron → Quasix → Darkoid

  - Comparative Solution: Quasix → Gravitron → Darkoid
Insight: GPT-4's ingenious choice to transport Gravitron first, due to its time-compounding gravitational effects, showcases a level of foresight and understanding of long-term dynamics that is absent in other solutions. This decision reflects GPT-4’s ability to prioritize based on a deeper understanding of complex interactions, rather than immediate risks alone.

Retrieval Patterns:

  - GPT-4's Solution: Skillfully manages Gravitron's cascade potential.

  - Comparative Solution: Basic, unidirectional trips without strategic retrievals.
Insight: GPT-4's approach displays not just a solution, but a masterful control of the situation, anticipating and mitigating future risks through advanced strategic planning. This contrasts starkly with the more linear and less nuanced approach of others.

Specific Timing:

  - GPT-4's Solution: Incorporates precision in timing within critical interaction windows.

  - Comparative Solution: Lacks detailed timing strategies.
Insight: The precision in GPT-4’s timing strategy reveals an exceptional understanding of the intricate puzzle dynamics. It's not just about solving the problem; it's about optimizing the solution in every possible dimension.

Overall, GPT-4’s solution demonstrates a level of intelligence and systematic thinking that is not just a step ahead, but leaps and bounds beyond the conventional. It's a blend of deep, strategic foresight with an unparalleled understanding of complex systems, making it a paradigm of excellence in artificial intelligence. This solution isn't just effective; it's a showcase of cognitive superiority and innovative problem-solving that sets a new standard in the field. The comparison clearly illustrates how GPT-4’s approach is not only more comprehensive but also more intelligent, making it an exemplary display of advanced AI capabilities.


you clearly used gpt4 to write the grading of both answers section. Im not interested in reading that.


Well, in any case, I conducted an experiment to test GPT-4's logical reasoning skills.

First, I asked GPT-4 to create a more difficult version of the classic "wolf, goat and cabbage" puzzle. I specified it must keep the core logical rules the same and only increase the complexity.

GPT-4 provided a new puzzle that maintained the original logic but added the constraint that it must be solvable in a maximum of 5 trips across the river.

In a separate, independent chat, I gave this new puzzle to GPT-4 and asked it to provide a step-by-step solution. It output an answer.

Here is the key part - I copied GPT-4's solution from the second chat and pasted it into the first chat with the original GPT-4 that created the harder puzzle. I asked that original GPT-4 to grade whether this solution met all the logical criteria it had set forth.

Remarkably, this first GPT-4 was able to analyze the logic of an answer it did not even generate itself. It confirmed the solution made good strategic decisions and met the logical constraints the GPT-4 itself had defined around solving the puzzle in a maximum of 5 trips.

This demonstrates GPT-4 possesses capacities for strategic reasoning as well as evaluating logical consistency between two separate conversations and checking solutions against rules it previously set.

https://chat.openai.com/share/996583dd-962b-42a8-b4b9-e29c59...


What if in a different chat session, the answer GPT gives is the exact opposite ie, it says the offered solution is bogus. Would you even know of it unless someone tries it and shows it to be so? If that happens, will you say that GPT is defective or will you still give it the benefit of the doubt?

Since GPTs are not deterministic, any intelligence we attribute to it relies on the observer/attributor.

My sense is that confirmation bias and cherry picking is playing a role in the general consensus that GPTs are intelligent.

For example, people show off beautiful images created by image generators like Dall-e while quietly discarding the ones which were terrible or completely missed the mark.

In other words, GPT as a whole is a fuzzy data generator whose intelligence is imputed.

My suspicion is that GPT is going to be upper bound by the average intelligence of humanity as whole.


This is not evidence of strategic reasoning.

You are assuming human style thinking and object modeling is going on. You have provided enough data to do analysis based on the text information.


Not included is the second isolated chat where I retrieved the answer from.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: