Hacker News new | past | comments | ask | show | jobs | submit login

RLHF isn't used to teach the model what it knows, it's used to teach the model how to follow instructions

Before RLHF instruct tuning the models could only complete sentences

Technically they still complete sentences, but now they have a strong association for a format where a question is followed by an answer




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: