Hacker News new | past | comments | ask | show | jobs | submit login

Who knows? Certainly not the public.

It might be a finetuned model that works better in such a setting.




The linked blog posts explains that it is fine-tuned on some reinforcement learning process. It doesn’t go into details but they do claim it’s not just the base model with chain of thought, there’s some fine-tuning going on.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: