Hacker News new | past | comments | ask | show | jobs | submit login

Yes, and the "de novo" explanation appears obvious as indicated - the model was trained differently - different reinforcement learning goals (reasoning vs human feedback for chat). The necessity for different prompting derives from the different operational behavior of a model trained in this way (to support self-evaluation based on the data present in the prompt, backtracking when veering away from the goals established in the prompt, etc - the handful of reasoning behaviors that have been baked into the model via RL).



Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: