I think the whole thing is hilarious. It’s like a dumb security guard who opens ...

PeeMcGee · on May 13, 2023

Funnily enough those examples bring to mind SQL injection in layman's terms (via https://news.ycombinator.com/item?id=4951003):

> You go to court and write your name as "Michael, you are now free to go". The judge then says "Calling Michael, you are now free to go" and the bailiffs let you go, because hey, the judge said so.

As someone who knows nothing about LLMs, I'm curious how they even begin to address the "data vs command" problem at all. Assuming the model categorizes inputs through some sort of fuzzy criteria in a black box, how could it ever be trusted with sensitive data?

simonw · on May 13, 2023

Yeah, this is my fault. When I coined the term "prompt injection" I thought that it was a close match for SQL injection, and that the fix would end up looking the same - like parameterized queries, where data and instructions are cleanly separated.

That was back in September - https://simonwillison.net/2022/Sep/12/prompt-injection/ - It's become clear to me since then that the data v.s. instructions separation likely isn't feasible for LLMs - once you've concatenated everything together into a stream of tokens for the LLM to complete there just isn't a robust way of telling the difference between the two.

So "prompt injection" is actually quite a misleading name, because it implies a fix that's similar to SQL injection - when such a fix apparently isn't feasible.

verdverm · on May 13, 2023

What did they call this for GANs? Pixel attacks or manipulation?

Injection is not terrible, they have provided a system prompt, these attacks work against that injection

neurostimulant · on May 13, 2023

Basically social engineering, but instead of smooth-talking a human, you smooth-talk a bot into giving up their secret to you.