Before RLHF instruct tuning the models could only complete sentences
Technically they still complete sentences, but now they have a strong association for a format where a question is followed by an answer
Before RLHF instruct tuning the models could only complete sentences
Technically they still complete sentences, but now they have a strong association for a format where a question is followed by an answer