The cat is out of the bag, and I don't see any reason training should be any more controlled than me personally viewing something and 'training' my brain on it. Using either to duplicate copyrighted works is already clearly illegal.
It is illegal for you to download copyrighted material and distribute it as your own. Models trained on such data can (and are statistically more likely) to produce similar output as their (training) input.
So training must consider licencing where copyright material is used and not consume all data.
Your brain is not a model. You can not reproduce most of what you see. You're not "training" your brain by glancing at an image as your recall concerning that image will be terrible.
My brain can certainly recreate something it’s seen before. And it can certainly create something similar to a thing it’s seen before. It’s legal to do the latter and illegal to do the latter. Imperfections on the exact recreations don’t affect the legality of it.
Am I violating copyright law because I am merely capable of producing a copy of something? Obviously not. Why should the model be?
For the same reason that the police being able to have a person look up in a physical printed file who owns a particular car via its license plate is not the same as having a network of cameras and computers that track every car in the city.
Yeah I don't have any problem with that too. If a cop has a right to see me, he should be legally allow to record me (and in fact would prefer all cop interactions were recorded). A camera + AI allows for massive cost savings on basic police work, enabling police to be more efficient. A camera has a lot less bias than a cop.
It's because you (and all of us) have a teeny human brain, and these are terrible at remembering things, so the teeny little bits you can remember are protected under Fair Use.