Yes, and the "de novo" explanation appears obvious as indicated - the model was trained differently - different reinforcement learning goals (reasoning vs human feedback for chat). The necessity for different prompting derives from the different operational behavior of a model trained in this way (to support self-evaluation based on the data present in the prompt, backtracking when veering away from the goals established in the prompt, etc - the handful of reasoning behaviors that have been baked into the model via RL).