Hacker News new | past | comments | ask | show | jobs | submit login

> The systems that I see most commonly deployed in practice are chatbots that use retrieval-augmented generation. These chatbots are typically very constrained: they can't use the internet, they can't execute tools, and essentially just serve as an interface to non-confidential knowledge bases.

Since everything from RAG runs through the prompt, unintended prompt-induced behavior is still an issue, even if its not an information-leak issue and you aren't using untrusted third-party data where deliberate injection is likely. E.g., for a somewhat contrived case that is an easy illustration, if your data store you were using the LLM to reference was itself about use of LLMs, you wouldn't want a description of an exploit that causes non-obvious behavior to trigger that behavior whenever it is recalled through RAG.




> Since everything from RAG runs through the prompt, unintended prompt-induced behavior is still an issue, even if its not an information-leak issue

It also doesn't completely safeguard a system against attacks.

See https://kai-greshake.de/posts/inject-my-pdf/ as an example of how information poisoning can be a problem even if there's no risk of exfiltration and even if the data is already public.

I have seen debate over whether this kind of poisoning attack should be classified as a separate vulnerability (I lean towards yes, it should, but I don't have strong opinions on that). But regardless of whether it counts as prompt injection or jailbreaking or data poisoning or whatever, it shares the same root cause as a prompt injection vulnerability.

---

I lean sympathetic to people saying that in many cases tightly tying down a system, getting rid of permissions, and using it as a naive data parser is a big enough reduction in attack surface that many of the risks can be dismissed for many applications -- if your data store runs into a problem processing data that talks about LLMs and that makes it break, you laugh about it and prune that information out of the database and move on.

But it is still correct to say that the problem isn't solved, all that's been done is that the cost of the system failing has been lowered to such a degree that the people using it no longer care if it fails. I sort of agree with GP that many chat bots don't need to care about prompt injection, but they've only "solved" the problem in the same way that me having a rusted decrepit bike held together with duck tape has "solved" my problem with bike theft -- in the sense that I no longer particularly care if someone steals my bike.

If those systems get used for more critical tasks where failure actually needs to be avoided, then the problem will resurface.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: