It's using tree search (tree of thoughts), driven by some RL-derived heuristics ...

RandomLensman · 2024-09-12T21:57:53 1726178273

That doesn't sound like a method for reasoning.

HarHarVeryFunny · 2024-09-13T01:25:05 1726190705

It's hard to judge how similar the process is to human reasoning (which is effectively also a tree search), but apparently the result is the same in many cases.

They are only vaguely describing the process:

"Similar to how a human may think for a long time before responding to a difficult question, o1 uses a chain of thought when attempting to solve a problem. Through reinforcement learning, o1 learns to hone its chain of thought and refine the strategies it uses. It learns to recognize and correct its mistakes. It learns to break down tricky steps into simpler ones. It learns to try a different approach when the current one isn’t working. This process dramatically improves the model’s ability to reason."

RandomLensman · 2024-09-13T09:29:08 1726219748

Not sure the way to superior "reasoning machines" would be through emulating humans.

HarHarVeryFunny · 2024-09-13T11:52:03 1726228323

True, although it's not clear exactly what this is really doing. The RL was presumably trained on human input, but the overall agentic flow (it seems this is an agent), sounds to me like a neuro-symbolic hybrid, potentially brute force iterating to great depth, so maybe more computer than brain inspired.

It seems easy to imagine this type of approach being super human on narrow tasks that play to it's strengths such as pure reasoning tasks (math/science), but it's certainly not AGI as for example there is no curiosity to explore the unknown, no ability to learn from exploration, etc.

It'll take a while to become apparent exactly what types of real world application this is useful for, both in terms of capability and cost.

RandomLensman · 2024-09-13T12:05:27 1726229127

Even on narrow tasks: could you imagine such a system proving (or disproving) the Riemann hypothesis?

Feels more like for narrow tasks with a kind of well defined approach, perhaps?

HarHarVeryFunny · 2024-09-13T12:52:48 1726231968

I agree, but it remains to be seen how that "feels" for everyday tasks where the underlying model itself would have failed. I guess at least now it'll be able to play tic tac toe and give minimal "farmer crossing river with his chicken" solutions!