Hacker News new | past | comments | ask | show | jobs | submit login

Would you like to bet money that 365 days from now, websites won’t be able to affect Bing the way that you’ve demonstrated in this PoC? I’ll happily take you up on whatever sum you choose.

I didn’t say it was easy. I said it’s inevitable. There are straightforward ways to deal with this; all OpenAI + Microsoft needs to do is to choose one and implement it.

Having a conversation with a user was also an undecidable task until one day it wasn’t. And the reason it became traceable is by using RL to reward the model for being conversational. It’s extremely straightforward to punish the model for misbehaving due to website injections, and the generalization of that is to punish the model for misbehaving due to text between two special BPE tokens (escaped text, I.e. website data).

This is different than users being able to jailbreak chatgpt or Bing with prompts. When the user is prompting, they’re programming the model. So I agree that they won’t be able to defend against DAN attacks very easily without compromising the model’s performance in other areas. But that’s entirely different from sanitizing website data that Bing is merely looking at; such data can be trivially escaped with BPE tokens and RLHF will do the rest.

If you do want to take me up on that bet, feel free to DM me on Twitter and we can hammer out the details. I’ll go any amount from $5 to $5k.

Note that I’m not claiming that it’ll be impossible to craft a website that makes Bing go haywire, just that it’ll be so uncommon as to be pretty much impossible in practice, the same way that SQL injection attacks against AWS are rare but technically not impossible. We’ll hear about them as a CVE, Microsoft will fix the CVE, and life moves on, just like today with every other type of attack. The bet is that there are straightforward, quick (< 1 week) fixes for these problems, 365 days from today.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: