What else could it be?

Ajedi32 · 2025-02-21T21:02:04 1740171724

An original composition based on a statistical analysis of the training data. Statistical data about a copyrighted work obviously isn't necessarily a derivative of that work. Otherwise Tolkien could sue me for telling you how many times The Lord of the Rings uses the word "the".

rasz · 2025-02-22T01:35:50 1740188150

Can it reproduce training data? Then its not analysis but compression, lossy compression.

Ajedi32 · 2025-02-24T14:32:43 1740407563

For most LLMs, with most works, no.

If you trained an LLM repeatedly on nothing but the text of LOTR until it could re-produce the books verbatim and then tried to sell copies of that LLM, then I agree that would be blatent copyright infringement, yes.

monocasa · 2025-02-21T18:54:25 1740164065

The industry is banking on Author's Guild v. Google to be precedent in such a way that it's functionally transformative enough to be a completely new work.

https://en.wikipedia.org/wiki/Authors_Guild,_Inc._v._Google,....

I think they have about a coin flip of a chance that it passes muster in the courts.