When I read your comment I trained my own mental model on your words. How is tha...

krainboltgreene · on Jan 5, 2024

This is an unserious argument and no one is swayed by it.

valine · on Jan 5, 2024

The idea that reading a piece of text constitutes copyright infringement is ridiculous. Copyright isn’t some infectious thing. Reading copyrighted text doesn’t give the copyright holder a claim to the future creative work of the reader.

You want to restrict model training, I get it. The debate is still ongoing, but I’m confident when these “copyright” claims work their way through court the AI companies will come out on top.

krainboltgreene · on Jan 8, 2024

> The idea that reading a piece of text constitutes copyright infringement is ridiculous.

No man, it's not ridiculous. If I write a program that copies someone's book and try to sell it I'm infringing on that copyright. I cannot sell a zipped version of the Harry Potter books. I feel like there's so many people weighing in on this discussion who haven't actually done any real world copyright related stuff.

valine · on Jan 8, 2024

I see the source of your confusion. LLMs are not actually zips of the training dataset.

krainboltgreene · on Jan 9, 2024

It is an incredibly common refrain amongst experts in the field that it is a compression of the dataset.

But that doesn't matter, because you clearly didn't understand what I was writing which doesn't shock me considering your position on LLMs.

valine · on Jan 9, 2024

> It is an incredibly common refrain amongst experts in the field that it is a compression of the dataset

This was a common idea three years ago. No one in the field seriously believes this today.

krainboltgreene · on Jan 10, 2024

You're going to have to let a lot of scientists know that, because they're still publishing papers with that understanding. I guess they should have consulted you first.

valine · on Jan 22, 2024

Not their fault, hindsight is 20/20.