Hacker News new | past | comments | ask | show | jobs | submit login

> It simplifies prompting and makes the LLM more steerable, more useful, more helpful.

While this is true, there is also evidence that RLHF and supervised instruction tuning can hurt output quality and accuracy[1], which are instead better optimized through clever prompting[2].

[1] https://yaofu.notion.site/How-does-GPT-Obtain-its-Ability-Tr...

[2] https://yaofu.notion.site/Towards-Complex-Reasoning-the-Pola...




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: