Hacker News new | past | comments | ask | show | jobs | submit login

This statement from Microsoft is just asking for a copyright infringement lawsuit because the courts have been very clear that web "content" is copyrighted unless it is explicitly placed in the public domain or old enough to no longer be under copyright.

Authors of open source code should consider adding explicit restrictions to their license barring the use of their code to train AI. This would make it easier to file lawsuits against Microsoft and others of their ilk who think they can train their AI with other people's work without fair compensation.






> Authors of open source code should consider adding explicit restrictions to their license barring the use of their code to train AI. This would make it easier to file lawsuits against Microsoft and others of their ilk who think they can train their AI with other people's work without fair compensation.

I see no reason to expect that this would alter or achieve anything. The wide-scale machine learning that’s been happening is entirely dependent on fair use exemptions from copyright. They’re not using it under your license—in fact can’t, current machine learning techniques and open source licenses already make it fundamentally impossible for them to comply—so what you put in it should be completely irrelevant.

No, if the fair use exemption is ever struck down, the entire field is dead in the water until (a) a change in the legal system, or (b) services like GitHub start demanding an additional license as part of their terms of service for the purpose.


No one would let AI get shut down in the US, there’s just too much at stake. Even if we don’t like what’s going on, we’ll take a measured approach in regulating, because otherwise it will just go overseas.

Does GPL does this already? Doesn't it already say that code derived from GPL code should be GPLed? So does that include any code produced by an LLM based on GPL code ?

That would seem to be a logical implication assuming courts reject claims that "everything on the internet is public domain" or that training an LLM on copyrighted material constitutes "fair use" of the copyrighted material.

I suspect it would technically be infringement even for MIT licensed code because the original author's copyright notice would presumably be missing.


Any such lawsuit would be settled out of court, with no admission of guilt, and no damaging information coming out via introduction into public evidence.

"Authors of open source code should consider adding explicit restrictions to their license barring the use of their code to train AI."



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: