> It simplifies prompting and makes the LLM more steerable, more useful, more helpful.
While this is true, there is also evidence that RLHF and supervised instruction tuning can hurt output quality and accuracy[1], which are instead better optimized through clever prompting[2].
While this is true, there is also evidence that RLHF and supervised instruction tuning can hurt output quality and accuracy[1], which are instead better optimized through clever prompting[2].
[1] https://yaofu.notion.site/How-does-GPT-Obtain-its-Ability-Tr...
[2] https://yaofu.notion.site/Towards-Complex-Reasoning-the-Pola...