I tend to agree with the parent: if it is legitimate to take copyrighted code, pass it through an LLM, output it in a new text file, and say "it doesn't have any copyright anymore because it was generated by an LLM", then copyright is essentially dead as a concept.
If copyright is dead, then no need for licenses anymore: anyway people can remove your copyright by laundering your code through an LLM.
> : if it is legitimate to take copyrighted code, pass it through an LLM, output it in a new text file, and say "it doesn't have any copyright anymore because it was generated by an LLM", then copyright is essentially dead as a concept.
But it's not legitimate. That's what I said. If you (a human) take copyrighted code and redistribute it against the terms of the license, you are guilty of a crime in most places.
That was my point, whether the language model was built legally or not, if you have control of such a language model and use it to produce copyrighted code and then distribute that, you will be held as violating copyright.
I think there's a lot of confusion here because some on the thread seem to believe a large language model can in and of itself be guilty of a crime. It can't. The trainer may be guilty, or the user could be guilty, or both could be guilty. The model simply is. It might not be redistributable depending on how courts interpret the training procedure (remains to be seen).
I feel like you are being pedantic here. If we all agree that an LLM being trained on copyrighted material without authorization is a copyright-laundering machine, then the consequence is that copyright is dead.
I don't really care who we should theoretically punish: once it's out there, it's out there for good. Maybe it's already too late, and the LLM people just broke copyright for everybody (or worse: they broke it for honest people). So much for making the world a better place.
If copyright is dead, then no need for licenses anymore: anyway people can remove your copyright by laundering your code through an LLM.