Huh, this is the least interesting thing I've written about prompt injection in the last few weeks, but the only one to make it to the Hacker News homepage.
Better recent posts:
- Delimiters won’t save you from prompt injection - https://simonwillison.net/2023/May/11/delimiters-wont-save-y... - talks about why telling a model to follow delimiters like ``` won't protect against prompt injection, despite that being mentioned as a solution in a recent OpenAI training series
- Prompt injection explained, with video, slides, and a transcript - https://simonwillison.net/2023/May/2/prompt-injection-explai... - a 12 minute video from a recent LangChain webinar I participated in where I explain the problem and why none of the proposed solutions are effective (yet)
- The Dual LLM pattern for building AI assistants that can resist prompt injection - https://simonwillison.net/2023/Apr/25/dual-llm-pattern/ - my attempt at describing a way of building AI assistants that can safely perform privileged actions even in the absence of a 100% reliable defense against prompt injection
I've emailed a repost invite to the submitter of https://news.ycombinator.com/item?id=35803564 ("Prompt Injection Explained"). Invited reposts go into the second-chance pool once they're submitted. If the article hasn't appeared after (say) a couple weeks or so, someone else is welcome to post it and email hn@ycombinator.com and we'll put it in the SCP.
I've emailed you a repost invite for https://news.ycombinator.com/item?id=35705159 ("The Dual LLM pattern for building AI assistants that can resist prompt injection"). It would be good to space these out, so maybe wait to use that link until a few days have gone by without one of your posts basking in front page glory?
Thanks for all the work figuring out this stuff and explaining it to the rest of us! It's amazing what a good writer can do when self-employed (https://news.ycombinator.com/item?id=35925266).
Thanks! Love your writing. One question for you - how do you absorb these new concepts and experiment with them so quickly? It seems like you have the output of a small team, not just one person.
you don't really know how new that is for different people. I'd imagine 3 months of learning about something interesting gives you quite a good idea about the topic, at least good enough to write about it.
Delimiters are shown quite often as possible mitigations, but they do not work. I had the same observation when doing the Prompt Engineering class from OpenAI/DeepLearningAI.
Basically every example was vulnerable, and I made it a special challenge to perform an indirect prompt injection for each one of them. This led to interesting exploits such as JSON object injections, HTML injection and even XSS. Overwriting order prices with the OrderBot was also quite fun. :)
Better recent posts:
- Delimiters won’t save you from prompt injection - https://simonwillison.net/2023/May/11/delimiters-wont-save-y... - talks about why telling a model to follow delimiters like ``` won't protect against prompt injection, despite that being mentioned as a solution in a recent OpenAI training series
- Prompt injection explained, with video, slides, and a transcript - https://simonwillison.net/2023/May/2/prompt-injection-explai... - a 12 minute video from a recent LangChain webinar I participated in where I explain the problem and why none of the proposed solutions are effective (yet)
- The Dual LLM pattern for building AI assistants that can resist prompt injection - https://simonwillison.net/2023/Apr/25/dual-llm-pattern/ - my attempt at describing a way of building AI assistants that can safely perform privileged actions even in the absence of a 100% reliable defense against prompt injection
More of my writing about prompt injection:
- https://simonwillison.net/series/prompt-injection/
- https://simonwillison.net/tags/promptinjection/