Indeed, I would not be surprised if OpenAI one day admits that the `o1` model us...

impossiblefork · 2024-12-10T22:22:04 1733869324

I don't think it does, because from this paper this kind of backfeeding is apparently quite difficult to train.

I've said it before, but I think it's just something like Quiet-STaR, but simplified. They have a bunch of question answer pairs, many of which are difficult. They generate a lot of tokens from the question (let's say, 3x the length of the expected answer), summarise whatever is generated and reinforce whenever it generates the right answer.

I don't think o1 is something complicated.

sigmoid10 · 2024-12-11T22:25:53 1733955953

o1 ist most likely just 4o optimized for CoT with some fine tuning or perhaps merely with a dedicated system prompt (which is probably the reason why they don't let you access it in the API) and enforced structured output. In fact you can recreate something very similar using 4o and the right system prompt + structured outputs.

pedrovhb · 2024-12-10T20:09:20 1733861360

That's certainly possible, but it reminds me a bit of a similar thing I've seen in their UI that rhymes in a way that makes me think otherwise. In the code interpreter tool, you have a little preview of the "steps" it's following as it writes code. This turns out to just be the contents of the last written/streamed comment line. It's a neat UI idea I think - pretty simple and works well. I wouldn't be surprised if that's what's going on with o1 too - the thought process is structured in some way, and they take the headings or section names and just display that.