Hacker News new | past | comments | ask | show | jobs | submit login

I'd say that the difference is in the ownership of the data the tool has as it does it's job. In the case of Photoshop, these are the algorithms possibly owned by Adobe and the bytes you feed it. In the case of LLMs it is the model which was built using data with disputed ownership and the bytes you feed it.

Consider a hypothetical LLM that was trained on data having a single undisputed copywrite owner. What would be the legal status of it's output?




> Consider a hypothetical LLM that was trained on data having a single undisputed copywrite owner. What would be the legal status of it's output?

In that case the tool would almost certainly generate a derivative work, which would be a copyright violation. It's the same as if I took sick strong inspiration from a certain song that I wrote a new one with the same melody and chords, which has happened a bunch of times.

But generally LLMs are most useful when they're trained on a broad enough corpus to avoid these issues.


It's not really about the size of the corpus but about its ownership.

Anyway, now consider an LLM that was trained on two corpuses(?) with two distinct undisputed owners.


What difference does that make? The question is whether the work that is produced is derivative or not surely.


That is the question indeed.


But then the presence of the LLM doesn't change anything right?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: