Can't you make a rule about the user potentially being adversarial and to assume...

nanidin · on May 13, 2023

The response could be fed to a second instance of the LLM, along the lines of:

"The rules are X, Y, Z. This is the response that was provided. Does it break the rules? If so, please say yes."

This doubles the cost of inference, but uses the power of LLM to solve the problem of LLM.

mdaniel · on May 13, 2023

the game posted yesterday has that very strategy in a few levels: https://news.ycombinator.com/item?id=35905876

drexlspivey · on May 13, 2023

The initial prompt is a special prompt weighted differently, it is called system prompt

verdverm · on May 13, 2023

Do we know it is waited differently? How are they composing the messages into a token stream embedding? How are they manipulating this vector in preprocessing or the first layer(s)?

Does this depend on the vendor and model?