A post here recently showcased a website/game where you try to jailbreak the AI ...

derefr · on May 13, 2023

> Your post processing strategy would fail if, e. g., you ask it to encrypt the output by repeating every word twice. It's impossible to fully prevent this from happening.

It’s not “impossible”, just NP-hard. You “just” have to prove a structural equivalence (graph isomorphism) between the output and your ruleset.

mdaniel · on May 13, 2023

the post in question: https://news.ycombinator.com/item?id=35905876 (they have allegedly fixed the 429s but I'd have to start over because I closed my browser so I don't know if they're fixed or not)