How RLHF Works

H8crilA · on June 21, 2023

This says nothing on how RLHF works, but a lot on what can be the results.

SleekEagle · on June 21, 2023

You can check here for an explanation (with some helpful figures) https://www.assemblyai.com/blog/the-full-story-of-large-lang...

inciampati · on June 21, 2023

Yes! I came to make the same comment.

It's got a catchy title but it leaves much to be resolved.

p1esk · on June 21, 2023

Original RLHF paper: https://arxiv.org/abs/1706.03741

wmwmwm · on June 21, 2023

Does anyone have any insight into why reinforcement learning is (maybe) required/historically favoured? There was an interesting paper recently suggesting that you can use a preference learning objective directly and get a similar/better result without the RL machinery - but I lack the right intuition to know whether RLHF offers some additional magic! Here’s the “ Direct Preference Optimization ” paper: https://arxiv.org/abs/2305.18290

fardo · on June 21, 2023

> Does anyone have any insight into why reinforcement learning is (maybe) required/historically favoured?

From a concept stage, it has attractive similarities to the way people learn in real life (rewarded for successful learnings, punished for failure), and although we know similarities to nature don’t guarantee better results than alternatives (for example, our modern airplane does not “flap” its wings the way a bird does), natural solutions will be continually looked to as a starting point and tool to try on new problems.

Additionally, RL gives you a good start on unclear-how-to-address problems. In spaces where it’s not clear where to begin optimizing besides taking actions and seeing how they do judged against some metric, reinforcement learning often provides a good mental and code framework for attacking these problems.

>There was a paper recently suggesting that you can use a preference learning objective directly

Doing a very quick skim, it looks like that paper is arguing rather than giving rewards or punishments based on preferences, you can just build a predictive classifier for the kinds of responses humans prefer. It seems interesting, though I wonder the extent to which you still have to occasionally do that reinforcement learning to generate relevant data for evaluating the classifier.

gradys · on June 21, 2023

My intuition on this:

Maximum likelihood training -> faithfully represent training data

Reinforcement learning -> seek out the most preferred answer you can

pestatije · on June 21, 2023

RLHF - reinforcement learning from human feedback

cylon13 · on June 21, 2023

A notable improvement over the GLHF strategy for interacting with GPT models.

lcnPylGDnU4H9OF · on June 21, 2023

(In case anybody's confused by the gaming culture reference: https://en.wiktionary.org/wiki/glhf. "Good Luck Have Fun")

ShamelessC · on June 22, 2023

I was familiar with that phrase and its shorthand ("GLHF") but the latter half of the sentence ("for interacting with GPT models") confused the punchline enough that the joke just didn't land, because the context is one of using RL to "interact with GPT" (relevant to this article) but a more appropriate context would have been regular ole RL using agents in a simulated environment, like - I don't know, a video game?

Maybe I'm overthinking it though.

1bent · on June 21, 2023

Thank you!

victor106 · on June 21, 2023

Anyone here know where we can find more resources on RLHF?

There’s been a lot written about transformer models etc., but I wasn’t able to find much about RLHF.

senko · on June 21, 2023

Blog post from Huggingface: https://huggingface.co/blog/rlhf

Webinar on the same topic (from same HF folks): https://www.youtube.com/watch?v=2MBJOuVq380&t=496s

RLHF as used by OpenAI in InstructGPT (predecessor to ChatGPT): https://arxiv.org/abs/2203.02155 (academic paper, so much denser than the above two resources)

samstave · on June 21, 2023

It will be interesting when we have AI doing RLHF to other AIs based on itself being RLHF'd and having an iterative AI model reinforcement...

But we talk of 'hallucinations' but what we wont get is AI malfeasense identified by AI RLHF trickery/lying?

z3c0 · on June 21, 2023

This is essentially the premise behind Generative Adversarial Networks, and if you've seen the results, they're astounding. They're much better for specialized tasks than their generalized GPT counterparts.

samstave · on June 21, 2023

Please expand on this?

z3c0 · on June 22, 2023

Sure thing - if you've seen "This Person Does Not Exist", it is the product of GANs: https://thispersondoesnotexist.xyz/

GANs pair a generative model with a classification model (both unsupervised) whose loss functions have been designed to be antithetical. Basically, one performing well means the other is performing poorly. Keeping with the example posed by the given link, this results in a kind of hyper-optimization that causes the generative model to gradually hone in on the perfect way to render a face, while the classification model keeps pace with it and feeds back that "I don't see a face" until something resembling a face emerges. With this approach, you can start with complete noise and end up at a photorealistic face.

p1esk · on June 22, 2023

Lately diffusion models have surpassed GANs in pretty much every way. They don’t have any of that adversarial dynamics you described.

z3c0 · on June 22, 2023

I'm not sure that's a valid statement on either count. There is plenty of work being done to bolster GANs with diffusion, in an attempt to take GANs where they couldn't before. Here's one such example: https://arxiv.org/abs/2206.02262

You might've been more correct to say that diffusion surpassed prior generative models, but the adversarial element doesn't even compare to diffusion at all. The adversarial element would be more accurately seen as a trade-off for standard RLHF/Human-in-the-Loop models.

I will bet money that GANs bolstered with diffusion will far outperform a standalone diffusion model.

hansvm · on June 21, 2023

It's not the first paper on the topic IIRC, but OpenAI's InstructGPT paper [0] is decent and references enough other material to get started.

The key idea is that they're able to start with large amounts of relatively garbage unsupervised data (the internet), and use that model to cheaply generate decent amounts of better data (ranking generated content rather than spending the man-hours to actually write good content). The other details aren't too important.

[0] https://arxiv.org/abs/2203.02155

SleekEagle · on June 21, 2023

My colleague wrote a couple of pieces that talk about RLHF:

1. https://www.assemblyai.com/blog/the-full-story-of-large-lang... (you can scroll to "What RLHF actually does to an LLM" if you're already familiar with LLMs)

2. https://www.assemblyai.com/blog/how-chatgpt-actually-works/

rounakdatta · on June 21, 2023

There's also this exhaustive post from one and only Chip Huyen: https://huyenchip.com/2023/05/02/rlhf.html

noam_compsci · on June 21, 2023

Not very good. I just want a step by step ultra high level explanation. 1. Build a model 2. Run it ten times 3. Get humans to do xyz until result abc.

RicDan · on June 21, 2023

Problem with this is that it leads to the algorithm targeting outputs that sound good for humans. Thats why its bad and wont help us, it should also incorporate „sorry dont know that“, but for that it needs to actually be smart

cubefox · on June 21, 2023

Honesty/truthfulness is indeed a difficult problem with any kind of fine-tuning. There is no way to incentivize the model to say what it believes to be true rather than what human raters would regard as true. Future models could become actively deceptive.

m00x · on June 21, 2023

It can be weighted to be more honest when it doesn't know if those answers are picked by the labeler.

dr_dshiv · on June 21, 2023

Need smarter labelers

abscind · on June 21, 2023

Any reason RLHF isn't just a band-aid on "not having enough data?"

trade_monkey · on June 21, 2023

RLHF is a band aid on not having enough data that fits your own biases and answers you want the model to give.

astrange · on June 22, 2023

It won't give answers at all if you don't train it to. It will output more questions because that's a more obvious completion to an incoming question.

astrange · on June 22, 2023

Less data can be better if the data is good: https://arxiv.org/abs/2306.11644

A language model won't develop question-answering behavior unless you train it to though.