RLHF isn't used to teach the model what it knows, it's used to teach the model h...

BoorishBears on Oct 14, 2023 | parent | context | favorite | on: OpenAI is too cheap to beat

RLHF isn't used to teach the model what it knows, it's used to teach the model how to follow instructions

Before RLHF instruct tuning the models could only complete sentences

Technically they still complete sentences, but now they have a strong association for a format where a question is followed by an answer