I'd say that the difference is in the ownership of the data the tool has as it d...

AlecSchueler · on July 4, 2023

> Consider a hypothetical LLM that was trained on data having a single undisputed copywrite owner. What would be the legal status of it's output?

In that case the tool would almost certainly generate a derivative work, which would be a copyright violation. It's the same as if I took sick strong inspiration from a certain song that I wrote a new one with the same melody and chords, which has happened a bunch of times.

But generally LLMs are most useful when they're trained on a broad enough corpus to avoid these issues.

sorokod · on July 4, 2023

It's not really about the size of the corpus but about its ownership.

Anyway, now consider an LLM that was trained on two corpuses(?) with two distinct undisputed owners.

AlecSchueler · on July 4, 2023

What difference does that make? The question is whether the work that is produced is derivative or not surely.

sorokod · on July 4, 2023

That is the question indeed.

AlecSchueler · on July 7, 2023

But then the presence of the LLM doesn't change anything right?