A post here recently showcased a website/game where you try to jailbreak the AI in multiple ways. Your post processing strategy would fail if, e. g., you ask it to encrypt the output by repeating every word twice. It's impossible to fully prevent this from happening.
> Your post processing strategy would fail if, e. g., you ask it to encrypt the output by repeating every word twice. It's impossible to fully prevent this from happening.
It’s not “impossible”, just NP-hard. You “just” have to prove a structural equivalence (graph isomorphism) between the output and your ruleset.
the post in question: https://news.ycombinator.com/item?id=35905876(they have allegedly fixed the 429s but I'd have to start over because I closed my browser so I don't know if they're fixed or not)