This feels to me like the most useless definition of "AI safety" in practice, an...

This feels to me like the most useless definition of "AI safety" in practice, and it's astonishing to see just how much R&D efforts are spent on it.

Thankfully the open-weights models are trivially jailbreakable regardless of any baked-in guardrails simply because one controls the generation loop and can make the model not refuse.