Same issue here, could be a topic that I have 10 years of experience in both academically and professionally. The instant contrarianism without any experiene is quite astounding.
I'd claim intuitively that having all bits 0 is a proper starting point and more efficient if there were any constraints on address, which was likely the case historically.
I'm happy to hear any dissenting opinions if this is inaccurate.
In what sense? Copilot isn’t a derivative work in the sense these licenses usually are understood to mean. And given that they’re open source code bases I expect licenses to explicitly disallow things, and consider anything not explicitly disallowed as permitted.
> Copilot isn’t a derivative work in the sense these licenses usually are understood to mean
The phrase “derived work” is, IIUC, a phrase from copyright law. And you’d have a hard time convincing me that Copilot-generated code is not a derived work from its training data.
> And given that they’re open source code bases I expect licenses to explicitly disallow things, and consider anything not explicitly disallowed as permitted.
That is very much not how copyright and licences work. Copyright law gives the copyright holder the exclusive right to make copies of the work, making derived works, (and to do some other related things, like making a public performance of it, etc.), so to do any of those things, you need explicit permission, i.e. a license from the copyright holder to do it. A license is not a list of things you are forbidden to do; on the contrary, it is a list of things you are permitted to do, which you would not otherwise be legally allowed to do according to copyright law.
Sure, but there are things you can do without a license because they're not copyright violations. You can read the work, learn from it, and sometimes make quotations under fair use.
This is a novel scenario. It seems unclear how the courts will interpret it? Never mind what we think, will they decide it's a derivative work, or is it a transformative use?
“Fair use” is, technically, not actually permitted by copyright law. ISTR that “fair use” is only a defense you can use when you are being sued for copyright violation.
Suppose we create a new AI image generator, and use as training input every image ever made of a Disney character (official images by Disney, that is, no fan art), including every frame of every Disney movie. Could we just use the output images of that AI however we wanted to? (Not withstanding trademarks.)
Looks like there is case law that fictional characters are protected if they are "sufficiently delineated." I don't see how that applies to code, though.
This is unclear. I have never seen an open source license that was explicit about this. Seems like a grey area.
It's not even clear how often training machine learning algorithms on code results in copyright violations. CoPilot does have a setting to detect and disallow direct copying, but how well does it work?
This legal uncertainty is enough that I wouldn't advise using it, but maybe people who use it will be fine?
I'd want to know what was done with the data after the scanning. Perhaps a specific car is being sought and the data is discarded afterwards, which would be more ethical than keeping the data and attempting to profit off of it, regardless of whether it's "legal" and there's some attempt at "disrupting an industry."