The issue is treating "AI" as something special. It is just derivative work. The...

chromanoid · 2025-02-02T00:45:29 1738457129

I agree, but it's derivative work that infringes almost indiscriminately on all publicly available cultural goods and can be very useful while doing so. That's why I think copyleft is somewhat a fitting consequence.

martin-t · 2025-02-02T00:51:16 1738457476

So you think all work produced with the help of LLMs should by required to be open sourced under a copyleft-like license?

That is actually an intriguing idea and at least aligned with the reasons I use AGPL for my code.

chromanoid · 2025-02-02T01:02:33 1738458153

Honestly, I only thought about the models, assuming that work that uses them, will also inevitably be incorporated into them. But maybe it is a good idea to actually include the produced works. It could create a nice incentive for rich corps to pay artists to create something AI free / copyleft free.

If we could extract correct attribution and even licensing out of work that was produced with AI, I don't think it would help that much. I would even assume that especially in this case the rich would profit the most. They wouldn't care having to pay thousands of artists for the pixels they provided to their AI generated blockbuster movie. It would effectively exclude the poor from using AI for compliance reasons. Or even worse rich corps monopolize the training data and then they can create content practically for free, while indies cannot use AI because they would have to pay traditional prices or give the rich corps money for indirectly using their training data.

martin-t · 2025-02-02T01:14:00 1738458840

> pay artists to create something AI free / copyleft free

I still don't think that's enough to be fair. If their work is used to produce value ad infinitum, them any one-time payment is obviously less than what they deserve.

The payment should be fractional compared to the produced value. And that is very hard to do since you don't know how much money somebody made by using the model.

> It would effectively exclude the poor from using AI for compliance reasons.

Again, this only an issue if you're thinking in terms of one-time fixed payments.

chromanoid · 2025-02-02T01:30:45 1738459845

I believe you think in too short time frames. In 70 years this becomes a futile discussion. AI is a way to directly benefit from the explosion of free content that the next decades will bring. The only way to counter this in an ethical way, is to establish some kind of enforced liberation for AI models, otherwise as you say, only the rich will profit from this.

martin-t · 2025-02-02T01:58:59 1738461539

It's author's life plus 70 years, if you meant that. And TBH I am mostly interested in the author's life part anyway.

If it was possible to train AI models on just the public domain, them I am sure ML companies would have because it's less effort than lobbying and risking lawsuits (though I am surprised how well creators have accepted that their work is used by others to profit without any compensation, I expected way more outrage).

Virtually all code relevant to training code-completion LLMs is written by people still alive or dead for way less than 70 years. We can try to come up with a better system over the next decades but those people's rights are being violated right now.

chromanoid · 2025-02-02T10:35:31 1738492531

I am curious why you are so critical especially when looking at code.

At least for Java there are search engines to look for code that call libraries etc. Models could probably be trained on free code and then be fed with the results of these search engines on demand even through the client who calls the LLM:

Client -> LLM -> Client automation API -> Code search on client machine to fill context -> Code generation w/o model that is trained on the code that was found through the code search, but merely used as context

Even if they only feed code into it, that is freely given, I think the difference in quality of output would approach the current quality and better over time, especially when using RAG techniques like the above.

Companies can also buy code for feeding the model after all. So beside the injustice you directly experience right now over your own code probably being fed into AI models, do you fear/despise anything more than that from LLMs?

martin-t · 2025-02-02T13:33:09 1738503189

What is "free code"? Most code is either proprietary (though sometimes public), under a permissive license or under a copyleft license. The only free code is code which is in the public domain.

You could make separate versions of an LLM depending on the license of its output. If it has t produce public domain code, it can only be trained on the public domain. If if has to produce permissive code (without attribution), then it can be trained on the public domain and permissive code. If copyleft, then those two and copyleft (but it does not solve attribution).

> Companies can also buy code for feeding the model after all.

We've come a long way since the time slavery was common (at least in the west) but we still have a class system where rich people can pay a one time fee (buying a company) and extract value in perpetuity from people putting in continuous work while doing nothing of value themselves. This kind of passive income is fundamentally unjust but pervasive enough that people accept it, just like they accepted slavery as a fact of life back then.

Companies have a stronger bargaining position than individuals (one of the reasons beside defense why people form states - to represent a common interest against companies). This is gonna lead to companies (the rich) paying a one time fee, then extracting value forever while the individual has to invest effort into looking for another job. Compensation for buying code can only be fair if it's a percentage of the value generated by that code.

> So beside the injustice you directly experience right now over your own code probably being fed into AI models, do you fear/despise anything more than that from LLMs?

Umm, I think theft on a civilization-level scale is sufficient.

chromanoid · 2025-02-02T14:04:53 1738505093

> Umm, I think theft on a civilization-level scale is sufficient.

As long as everybody can also benefit from it, I see it as some kind of collective knowledge sharing.

As you stated in the paragraphs before unless the wealth distribution changes, LLMs may lead to an escalating imbalance, unless the models are shared for free as soon as a critical mass of authors is involved, regardless of who owns the assets.

martin-t · 2025-02-02T16:11:28 1738512688

It's about proportions.

1) The current system is that the rich get richer faster than the poor.

2) You're proposing a system where everybody gets richer at the same rate at the best of times (but probably devolves into case 1 IMO)

3) I am proposing a system where people get richer at the rate of how much work they put in. If a rich person does not produce value, he doesn't get richer at all. If a poor person produces value he gets richer according to how many people benefit from it. If a poor person puts in 1000 units of work and a rich person puts in 10 units of work to distribute the work to more people (for example through marketing), they get richer at comparative rates 1000:10.

My system is obviously harder to implement (despite all the revolutions in history, societies have always devolved to case 1, or sometimes case 2 that devolved into 1 later). It might be impossible to implement perfectly but it does not mean we should stop trying to make things at least less unfair.

---

We're in agreement that forcing companies who take (steal) work from everyone to release their models for free is better than letting them profit without bounds.

However, I am taking it way further. If AI research leads to fundamental changes in society, we should take it as an opportunity to reevaluate the societal systems we have now and make them more fair.

For example, I don't care who owns assets. I care about who puts in work. It's a more fundamental unit of value. Work produces assets after all.

---

And BTW making models free does not in any way help restore users' rights provided by AGPL. And I have yet to come across anybody making a workable proposal how to protect these rights in an age where AGPL code is remixed through statistics into all software without making it also AGPL. In fact, I have yet to find anybody who acknowledges it's a problem.