ChatGPT is based on GPT-3. Bing chat is likely based on GPT-3.5, but we don't ha...

sillysaurusx · on March 1, 2023

> It's not a brand new, completely separate Microsoft model.

It’s likely a brand new, completely separate Microsoft model. OpenAI was working with Microsoft on this about six months before ChatGPT launched. At that time, RLHF wasn’t a thing — or if it was, it was nascent.

The sister thread https://news.ycombinator.com/item?id=34973654 points out that "completely separate models" are exactly what OpenAI is now selling for $250k/yr. Obviously, Microsoft would get these same benefits, since they're OpenAI's de facto #1 customer. So it's entirely up to Microsoft whether (and when) they choose to upgrade their checkpoints or not.

The fact that there's a Sydney prompt but no ChatGPT prompt should alert you that ChatGPT is fundamentally different from Sydney. Clearly Sydney wasn't trained via RLHF, otherwise it wouldn't need to be prompted explicitly -- and explicit prompting is how it got itself into this mess in the first place.

> It's not like a JSON parser, I guarantee you that Microsoft did not sit down and say, "let's decide the finite list of text tokens GPT will use in order to know that we're talking to it." At best you can push AI towards alignment around tokens, but... you can't give it these kinds of detailed instructions or easily restrict its operating space. It's a language model.

Actually, you can. That’s the purpose of RLHF. You reward the model for behaving the way you want. And in that context, it’s a matter of rewarding it for paying attention to [system].

Why would they include [system](#instructions) in their prompt if it wasn’t trained to pay attention to it? How do you think bing generates options that the user can click on? It already has some kind of internal [system]-like protocol which Bing clearly pays attention to. My point is that they likely sanitized the chat so that the user can't generate these system tokens (otherwise the user would be able to generate buttons with arbitrary text in them), and it seems entirely possible that they overlooked this sanitization when pasting website data into their context window.

Remember, our goal here on HN is to write for an audience, not to spar with each other about who’s right. And I think the most entertaining thing I can do at this point is to wish you a good night and go to sleep. I hope you have a good rest of your week.

danShumway · on March 1, 2023

> The sister thread https://news.ycombinator.com/item?id=34973654 points out that "completely separate models" are exactly what OpenAI is now selling for $250k/yr.

The sister thread isn't describing a Microsoft model, it's describing an OpenAI model.

> Actually, you can. That’s the purpose of RLHF. You reward the model for behaving the way you want. And in that context, it’s a matter of rewarding it for paying attention to [system].

You're overthinking how specific alignment is. ChatGPT went through alignment to train it to stay on topic during conversations. There's a difference between general alignment and the kind of hyper-specific training you're thinking of.

But you're also kind of missing the scope of prompt injection attacks. Even if Microsoft did train the model to pay attention to specific prompt words, it doesn't mean that the model wouldn't be vulnerable to other prompt injections, because prompt injections are not a deliberate vulnerability that OpenAI added. They're an emergent property of the model.

Look, the fact that prompt injections do work today with Bing chat that don't use [system][0] should cause you to think that maybe there's something more complicated going on here than just bad parsing rules.

If I can't convince of that, then... I can't convince of that, it's fine; in terms of disagreements I've had on HN, this one is pretty low-consequence, it's a purely technical disagreement. But I'm going to throw out a prediction that Microsoft is not going to be able to easily guard against this attack. Check back in over time and see if that prediction holds true if you want to. Otherwise, similarly, I hope you have a great week. And honestly, I hope you're right, because if you're not right then it's going to be a significant challenge to wire any LLM that works with 3rd-party data to real-world systems.

[0]: Read through the paper, there are examples they list that don't use [system], instead they emulate Basic code or a terminal prompt. Things that Microsoft almost certainly didn't train the model to pay attention to.