It's using tree search (tree of thoughts), driven by some RL-derived heuristics controlling what parts of the practically infinite set of potential responses to explore.
How good the responses are will depend on how good these heuristics are.
It's hard to judge how similar the process is to human reasoning (which is effectively also a tree search), but apparently the result is the same in many cases.
They are only vaguely describing the process:
"Similar to how a human may think for a long time before responding to a difficult question, o1 uses a chain of thought when attempting to solve a problem. Through reinforcement learning, o1 learns to hone its chain of thought and refine the strategies it uses. It learns to recognize and correct its mistakes. It learns to break down tricky steps into simpler ones. It learns to try a different approach when the current one isn’t working. This process dramatically improves the model’s ability to reason."
True, although it's not clear exactly what this is really doing. The RL was presumably trained on human input, but the overall agentic flow (it seems this is an agent), sounds to me like a neuro-symbolic hybrid, potentially brute force iterating to great depth, so maybe more computer than brain inspired.
It seems easy to imagine this type of approach being super human on narrow tasks that play to it's strengths such as pure reasoning tasks (math/science), but it's certainly not AGI as for example there is no curiosity to explore the unknown, no ability to learn from exploration, etc.
It'll take a while to become apparent exactly what types of real world application this is useful for, both in terms of capability and cost.
I agree, but it remains to be seen how that "feels" for everyday tasks where the underlying model itself would have failed. I guess at least now it'll be able to play tic tac toe and give minimal "farmer crossing river with his chicken" solutions!
How good the responses are will depend on how good these heuristics are.