Hacker News new | past | comments | ask | show | jobs | submit login

The second goal is muddying the waters and making people not care.

Say you're deciding between two programs (or AI models)[0], you prefer an open source one, a colleague prefers one that just pretends to be open. You say your choice is preferable because it's open, he says the same about his choice. Then you say the dreaded "well, actually" and either you sound like a fundamentalist or an asshole.

[0]: None of those are truly open source because they're all trained on stolen data. And see? Now I sound like a fundamentalist.




I was looking for a list of free AI models and I searched for “open ai models”, which is when I first understood the terrible genius of the “OpenAI” name.


I'm not sure why training on stolen data would disqualify them if said data was available or at minimum accurately specified what it was.


If (stolen) data is available to download ok, that would be the accurate definition of open AI model. But "accurately specified" is not because you would need to trust that the person specifying it is actually honestly doing it. And I think we all know what happens to all that honesty when economic interests are in place.


The data is bound by licenses which affect how the resulting model can be used. I release most of my public code under AGPL so that, for most intents and purposes, anybody using it has to also make their code public and benefit society at large.

Now, with LLMs, anybody can launder my code and use it to build proprietary software for his own benefit without giving anything back. That is a violation of the spirit of AGPL and hopefully the law too.


Available doesn't excuse anything. I don't know why people say it like it matters.

When CBS lets you watch a show on their web site, even for free and anonymously, they still own the show and did not grant you any right to re-distribute or re-use it.

What AIs do is also not fair use, because that isn't just about the size of a quote but about usage. A discussion is fair use, excerpting simply to pluck a cherry and present it as your own is not.


Not a lawyer, but my (possibly poor) understanding was that courts were leaning towards it indeed being fair use?


Songs are copyrighted over the equivalent of a mere few bytes.

I shall write a book titled "The wizdom of BKW", and it will contain merely a single sentence plucked from many other famous and deeply insightful authors. Not a discussion or examination of them, and not even credited to any of them. The book will look like you asked me for advice and insights into human nature and philosophy, and all these gems of insight are my direct answer.

No single quote will be more than a sentence or two. A teeny tiny fraction of the 400-page books they came from.

I don't care if any law currently recognizes that as wrong, it is wrong.


Great point!




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: