"training data is totally a violation of copyright"
This really isn't clear because cognition is treated as a special exception to copyright. Every thought we have is derivative of everything we've seen before to some degree; reading a book makes our brains a derivative work. But we recognize that cognition is special.
With machines we tend to apply a strict test: Did copyright go in? If so, the output is almost certainly derivative.
With human brains, with cognition, it isn't enough to prove that a person has consumed a copywitten work prior to having a thought -- instead we judge every thought individually as to its originality.
If we are in a position to apply similar cognitive rules to an LLM then the weights won't be derivative works and we will judge each output as to its originality rather than simply assume.
"This really isn't clear because cognition is treated as a special exception to copyright."
Actually, no. It's considered a transformative use. If you memorize a copyrighted play or piece of music and then perform in in public, that's a copyright violation. It's the literalness of the copy that matters.
No, that's totally incorrect, we do not consider every observation a "transformative use" as applied to the human mind. If you memorize a copyrighted play and write another play it is NOT inherently a copyright violation of everything which has come before. We just don't do that.
The new play is judged as to its originality.
People who have seen a play (everybody) are allowed to write new plays which aren't beholden to the copyright of the first play they've ever watched.
>> "training data is totally a violation of copyright"
> This really isn't clear because cognition is treated as a special exception to copyright.
Human cognition; not the latest algorithms and their output, which some enthusiastic software engineers eagerly confuse for cognition. It's actually pretty clear.
> The open question is how to handle machines that mimic the process.
It's not really an open question, except for software engineers who've talked themselves into thinking of humans as computers. A machine is not a human mind, so does not benefit from the legal exceptions and rights granted to the latter.
I remarked on how human cognition is treated as a magical process with respect to copyright law.
This is just a legal fact. It has nothing to do with how an LLM operates internally, or whether an LLM is at all similar to a human mind in terms of internal mechanics.
> "The legal question of does "copyright goes away if your violation is big enough?"
1) no similarities have ever been demonstrated between large language models and human cognition, and until that happens (spoiler: never) there is no basis in comparing them like this.
2) even if they were somehow proven to be the same there is still no reason why the same standards need to be applied to computer programs and humans because computer programs do not have any rights or legal protections.
3) cognition is not a "special exception to copyright" because it is entirely unrelated. "Copy" "right" is who has rights to make copies. Your thoughts are not considered copies because they are intangible.
4) we do not "judge every thought individually as to it's originality" because other peoples' thoughts are entirely opaque. Nobody is judging your thoughts, and if you think they are you need to take your medications.
"1) no similarities have ever been demonstrated between large language models and human cognition"
This is false. The LLM's entire purpose is to mimic cognition.
You could argue that the operation differs in important ways - of course. But the similarity of output is literally the entire point.
"2) even if they were somehow proven to be the same"
I didn't suggest they need to be the same, proven or otherwise. I think you're not understanding. The point is that the function is similar.
How it works doesn't necessarily matter.
"3) cognition is not a "special exception to copyright" because it is entirely unrelated. "
False as a matter of law.
"4) we do not "judge every thought individually as to it's originality" because other peoples' thoughts are entirely opaque."
Also false as a matter of law. When you publish your thoughts - your works, writing, whatever they are judged as to their originality if the question of who owns the copyright is raised.
"Nobody is judging your thoughts, and if you think they are you need to take your medications."
There's no need to be snarky and disingenuous.
From the comment guidelines: Be kind. Don't be snarky. Converse curiously; don't cross-examine. Edit out swipes.
>This is false. The LLM's entire purpose is to mimic cognition.
Purpose and mechanism are not the same thing. "Similarity of output" does not make it equivalent.
>I didn't suggest they need to be the same, proven or otherwise. I think you're not understanding. The point is that the function is similar.
Sure, go ahead and ignore all but half a sentence and then accuse me of missing the point.
>False as a matter of law.
Show me the court case where somebody was found to have violated copyright law by thinking about something.
>When you publish your thoughts
You don't publish your thoughts. You publish essays, internet comments, articles, videos, etc based on what you are thinking and those are subject to copyright law.
>There's no need to be snarky and disingenuous.
How dare you, i would never disingenuously tell somebody who thinks his thoughts belong to other people to take their psychiatric medications. Of course i did mean that they should be prescribed by a licensed physician and looking back i regret not stating that explicitly.
"The LLM's entire purpose is to mimic cognition." is your counterpoint to me saying that no peer-reviewed source has ever demonstrated a similarity between LLMs and human cognition. I'm talking about mechanism and you're talking about purpose.
Thank you for saying what I was going to say to this person. I'm so fucking tired of seeing people who probably have never opened a neuroscience textbook talk about cognition.
This really isn't clear because cognition is treated as a special exception to copyright. Every thought we have is derivative of everything we've seen before to some degree; reading a book makes our brains a derivative work. But we recognize that cognition is special.
With machines we tend to apply a strict test: Did copyright go in? If so, the output is almost certainly derivative.
With human brains, with cognition, it isn't enough to prove that a person has consumed a copywitten work prior to having a thought -- instead we judge every thought individually as to its originality.
If we are in a position to apply similar cognitive rules to an LLM then the weights won't be derivative works and we will judge each output as to its originality rather than simply assume.