Jeremy -- this is interesting and worthwhile. Thank you!
In the same spirit (ignoring the question of whether this sort of attempted regulation is a good idea), I have a question:
Debating release vs. deploy seems a bit like regulating e.g. explosives by saying "you can build the bomb, you just aren't allowed to detonate it". Regulation often addresses the creation of something dangerous, not just the usage of it.
Did you consider an option to somehow push the safety burden into the training phase? E.g. "you cannot train a model such that at any point the following safety criteria are not met." I don't know enough about how the training works to understand whether that's even possible -- but solving it 'upstream' makes more intuitive sense to me than saying "you can build and distribute the dangerous box, but no one is allowed to plug it in".
(Possibly irrelevant disclosure: I worked with Jeremy years ago and he is much smarter than me!)
Yes I considered that option, but it's mathematically impossible. There's no way to make it so that a general purpose learned mathematical function can't be tweaked downstream to do whatever someone chooses.
So in that sense it's more like the behaviour of the pen and paper, or a printing press, than explosives. You can't force a pen manufacturer to only sell pens that can't be used to write blackmail, for instance. They simply wouldn't be able to comply, and so such a regulation would effectively ban pens. (Of course, there's also lots of ways in which these technologies are different to AI -- I'm not making a general analogy here, just an analogy to show why this particular approach to regulation is impossible.)
I would not say it’s impossible… my lab is working on this (https://arxiv.org/abs/2405.14577) and though it’s far from mature - in theory some kind of resistance to downstream training isn’t impossible. I think under classical statistical learning theory you would predict it’s impossible with unlimited training data and budget for searching for models but we don’t have those same gaurentees with deep neural networks.
That makes sense. Regulating deployment may simply be the only option available -- literally no other mechanic (besides banning releasing models altogether) is on the menu.
In the same spirit (ignoring the question of whether this sort of attempted regulation is a good idea), I have a question:
Debating release vs. deploy seems a bit like regulating e.g. explosives by saying "you can build the bomb, you just aren't allowed to detonate it". Regulation often addresses the creation of something dangerous, not just the usage of it.
Did you consider an option to somehow push the safety burden into the training phase? E.g. "you cannot train a model such that at any point the following safety criteria are not met." I don't know enough about how the training works to understand whether that's even possible -- but solving it 'upstream' makes more intuitive sense to me than saying "you can build and distribute the dangerous box, but no one is allowed to plug it in".
(Possibly irrelevant disclosure: I worked with Jeremy years ago and he is much smarter than me!)