There’s a lot of text out there that depicts people doing bad things, from their own point of view. It’s possible that the model can get really good at generating that kind of text (or inhabiting that world model, if you are generous to the capabilities of LLM). If the right prompt pushed it to that corner of probability-space, all of the ethics the model has also learned may just not factor into the output. AI safety people are interested in making sure that the model’s understanding of ethics can be reliably incorporated. Ideally we want AI agents to have some morals (especially when empowered to act in the real world), not just know what morals are if you ask them.
> Ideally we want AI agents to have some morals (especially when empowered to act in the real world), not just know what morals are if you ask them.
Really? I just want a smart query engine where I don't have to structure the input data. Why would I ask it any kind of question that would imply some kind of moral quandary?