Hacker News new | past | comments | ask | show | jobs | submit login

To me it looks like they paired two instances of the model to feed off of each other's outputs with some sort of "contribute to reasoning out this problem" prompt. In the prior demos of 4o they did several similar demonstrations of that with audio.



To create the training data? Almost certainly something like that (likely more than two), but I think they then trained on the synthetic data created by this "conversation". There is no reason a model can't learn to do all of that, especially if you insert special tokens (like think, reflect etc that have already shown to be useful)


No I'm referring to how the chain of thought transcript seems like the output of two instances talking to each other.


Right - i don't think it's doing that. I think it has likely been fine tuned to transition between roles. But, maybe you are right.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: