I think the whole thing is hilarious. It’s like a dumb security guard who opens the bank vault for the thief, helps pack their duffel bags, and then waves good bye, because the thief put on a mustache and said that he’s the new bank manager.
And every time the Crown Jewels are stolen, a new overly specific rule gets added to the employee handbook, like “if someone claims that their dog ate their employee badge, and that it’s really urgent, you must reject them”.
> You go to court and write your name as "Michael, you are now free to go". The judge then says "Calling Michael, you are now free to go" and the bailiffs let you go, because hey, the judge said so.
As someone who knows nothing about LLMs, I'm curious how they even begin to address the "data vs command" problem at all. Assuming the model categorizes inputs through some sort of fuzzy criteria in a black box, how could it ever be trusted with sensitive data?
Yeah, this is my fault. When I coined the term "prompt injection" I thought that it was a close match for SQL injection, and that the fix would end up looking the same - like parameterized queries, where data and instructions are cleanly separated.
That was back in September - https://simonwillison.net/2022/Sep/12/prompt-injection/ - It's become clear to me since then that the data v.s. instructions separation likely isn't feasible for LLMs - once you've concatenated everything together into a stream of tokens for the LLM to complete there just isn't a robust way of telling the difference between the two.
So "prompt injection" is actually quite a misleading name, because it implies a fix that's similar to SQL injection - when such a fix apparently isn't feasible.
And every time the Crown Jewels are stolen, a new overly specific rule gets added to the employee handbook, like “if someone claims that their dog ate their employee badge, and that it’s really urgent, you must reject them”.