To me it looks like they paired two instances of the model to feed off of each o...

danielmarkbruce · 2024-09-13T15:05:23 1726239923

To create the training data? Almost certainly something like that (likely more than two), but I think they then trained on the synthetic data created by this "conversation". There is no reason a model can't learn to do all of that, especially if you insert special tokens (like think, reflect etc that have already shown to be useful)

Zenzero · 2024-09-13T15:36:08 1726241768

No I'm referring to how the chain of thought transcript seems like the output of two instances talking to each other.

danielmarkbruce · 2024-09-13T16:44:07 1726245847

Right - i don't think it's doing that. I think it has likely been fine tuned to transition between roles. But, maybe you are right.