The second goal is muddying the waters and making people not care.
Say you're deciding between two programs (or AI models)[0], you prefer an open source one, a colleague prefers one that just pretends to be open. You say your choice is preferable because it's open, he says the same about his choice. Then you say the dreaded "well, actually" and either you sound like a fundamentalist or an asshole.
[0]: None of those are truly open source because they're all trained on stolen data. And see? Now I sound like a fundamentalist.
I was looking for a list of free AI models and I searched for “open ai models”, which is when I first understood the terrible genius of the “OpenAI” name.
If (stolen) data is available to download ok, that would be the accurate definition of open AI model. But "accurately specified" is not because you would need to trust that the person specifying it is actually honestly doing it. And I think we all know what happens to all that honesty when economic interests are in place.
The data is bound by licenses which affect how the resulting model can be used. I release most of my public code under AGPL so that, for most intents and purposes, anybody using it has to also make their code public and benefit society at large.
Now, with LLMs, anybody can launder my code and use it to build proprietary software for his own benefit without giving anything back. That is a violation of the spirit of AGPL and hopefully the law too.
Available doesn't excuse anything. I don't know why people say it like it matters.
When CBS lets you watch a show on their web site, even for free and anonymously, they still own the show and did not grant you any right to re-distribute or re-use it.
What AIs do is also not fair use, because that isn't just about the size of a quote but about usage. A discussion is fair use, excerpting simply to pluck a cherry and present it as your own is not.
Songs are copyrighted over the equivalent of a mere few bytes.
I shall write a book titled "The wizdom of BKW", and it will contain merely a single sentence plucked from many other famous and deeply insightful authors. Not a discussion or examination of them, and not even credited to any of them. The book will look like you asked me for advice and insights into human nature and philosophy, and all these gems of insight are my direct answer.
No single quote will be more than a sentence or two. A teeny tiny fraction of the 400-page books they came from.
I don't care if any law currently recognizes that as wrong, it is wrong.
Open source was always a corporate-friendly compromise, but seemed like some of the people involved had a lot of integrity.
What we need is those open source people with integrity to put the smack down on those willfully abusing and destroying the terms.
If you can't do it with trademarks/certifications/licensing/memberships/etc., do it with mainstream journalism. Like might be being done here, except The Register has long had rare insider knowledge, and is relatively niche. You need to get the message out to everyone who's not already in the know, including lawmakers.
(Incidentally, the FSF also has integrity, but, besides prompting open source by being zero-compromise -- which is fine in their case -- they have an additional challenge of seeming to be clinically incapable of advocacy in situations that are aligned.)
That compromise thing was like eons ago when folks like Bruce Perens and ESR tried to tow that fine line between commercial open source and free libre paradigms and were successful to a great degree.
But today, such nuance doesn't exist. The commercial ones have gone full commercial and making no qualms about it (thus the title of this post).
If this attitude continues, all commercial interests in FOSS will be seen with high scepticism unless they have a proven track record of being a good actor.
Do you think, if open source never existed, if there were only free software and non-free software, we wouldn't be arguing about whether AI corporations can truly call their free models free?
Companies always seemed much more weary of "free software" as compared to open source. Probably because of the ambiguous meaning of free in English, honestly that is one of the reason we have open source as a concept.
Companies like the flexibility in "open source", even companies who release code as GPL rarely talk about "free software", they are open source companies.
How could we? Free Software makes it clear that when you modify the Free thing and productize it, you have to share the modifications with the public under the same licensing. What's there to argue about? You're either doing that or you're not. If you find a loophole in the text, then the license gets updated, the loophole explicitly closed, and everybody who agrees moves to the new version.
> Free Software licenses and Open source licenses are essentially the same (apart a few odd examples).
Apart from every example of GPL software, which can't be used under the permissive terms of Open Source. The last person I replied to about this used the word "essentially" here, also. Is there a common source slogan for this belief?
Also, somebody should tell all of the people who keep rewriting GPL stuff in order to have an MIT version.
This is a technicality. Non-copyleft licenses can qualify as Free Software because they can be easily relicensed into Free Software (as well as into proprietary software.)
They qualify as Free Software simply because they meet (though do not protect like copyleft) all 4 freedoms. The relicensing is more of a secondary point.
Free is ambiguous term. It might be free in code and price. Or it might be free in price, but closed source. It could be free for me as private person, but not for business.
Is freeware free software? It is rather murky term for me.
That's entirely the fault of the English language. The same term when translated into many other languages (including my first language) creates no such confusion - because they have different words for free as in free beer and free as in free speech.
The point here is that that linguistic peculiarity in just one language doesn't make the word 'free software' invalid or unsuitable, as long as 'free software' is a recognizable term (which it is). This is why FSF makes this explicitly clear with an entire article.
in English, the word "free" has not served well.. suggested alternative "libre" ... oh, except LOSS does not sound great! seems challenging right now.. "free" has failed IMHO .. it is literally mocked by finance people no? every adult in the US and elsewhere must pay bills.. "free" is failing as a label
I have worked at multiple companies that vilified open source anything, while building their entire businesses on Linux, Java, Debian, and thousands of other "OSI Approved" software.
It's because, in my experience, the majority of businesses want to take but do not want to feel any obligation to give back or support.
Which was the entire purpose of Open Source, from conception, and the only way it is distinct from other licenses. Open Source is like Free Software, except you can use it without giving anything away.
> Open Source is like Free Software, except you can use it without giving anything away.
No, Open Source and Free Software are two names for essentially the same thing. The Free Software Foundation has a preference for licenses which go beyond its own Free Software Definition [0] and which are also "Copyleft" [1], but does not define Free Software in a way which requires that it also be Copyleft.
> No, Open Source and Free Software are two names for essentially the same thing.
This is not substantially true, which is why I assume you've added "essentially" in here. Open Source is Free Software, because anybody can take it and make it anything they want as long as they comply with the minimal license terms. Open Source can be proprietary, too, if somebody takes it, complies with the minimal license terms, and makes it proprietary.
> This is not substantially true, which is why I assume you've added "essentially" in here.
No, it is. The OSI Open Source definition and the FSF Free Software definition are framed differently but require substantially the same things, and for virtually every license on which both have expressed an opinion, they have cone to the same conclusion as to whether it meets each organization’s requirements.
Free Software does not require a license that prevents proprietary re-licensing, that is an additional separate concern beyond the Free Software definition (Copyleft); the FSF generally prefers copyleft licenses, but recognizes non-copyleft licenses as Free Software licenses.
You seem to under the mistaken impression that copyleft is a requirement to meet the Free Software definition, but that has never been the case.
To be clear, Open Source and Free Software aren't licenses. They are philosophies. FOSS licenses come in two major varieties - copyleft (like GPL) and permissive (like MIT). It's possible for either type of license to conform to both open source and free software philosophies. In fact, the vast majority of FOSS licenses - both copyleft and permissive - are endorsed by both camps (OSI and FSF). Also, both camps reject licenses for similar reasons - like for having proprietary terms (as in case of BSL).
The property of being able to keep changes to oneself is the property of permissive licenses, not opensource. Open source software under copyleft licenses cannot be modified and distributed while withholding changes. The inverse is applicable to FS under permissive license too.
The real difference between free software and open source is in how they treat the software. FS camp considers software as something that should give the users total freedom over the computing devices they own. The software shouldn't constrain or exploit the end user in any manner. This of course needs the source to be open.
OSS camp established open source because they realized the advantages of 'open' source, but didn't like the emphasis on freedom. That's more in line with corporate philosophy - take advantage of unaffiliated talent to increase code volume and quality, without making any commitment to user freedom. This is why many companies completely avoid the term free software. It's also easy to find 'open source' code that's very exploitative towards users, despite being open and using FSF-endorsed licenses.
True, this needs clarification that currently doesn't exist for large models where training costs heavy millions and binary artifact is both precious and malleable – unlike ordinary compilation.
Regardless if – once OSI establishes their definition(s) – Meta will choose path of adherence or not, they still deserve a paragraph of praise for what they're doing.
As a side note OSI should also recognize that in the era of giant cloud providers protection from predatory market participants is also a thing and should exist as clear licensing option. Mongo, Elastic and Redis drama could be avoided in the future if there was a clear option to protect author side sustainability without affecting open source spirit for end users.
ps. I also believe that "Open <something>" should be protected phrase similar to how "Police", "Federal", "Government" or "Organic" is protected to not mislead the public so we don't have things like "OpenAI" nonsense.
I can more readily(?) accept ones which mis-label their announcements of "Open Source!!1 under My Awesome License 1.0beta" than I can rug-pulls. Look, if you wanna use some rights-harming license and just shit on the term "Open Source," that's bad, but from a certain perspective understandable if the marketing folks don't grok the nuances of Open Source. The world is filled with misguided people, and I can just command-w the window and never use your product
But if you accept contributions from the community for years, and ingrain your product in hundreds of thousands of workflows around the world, and only then decide "holy shit, salaries cost money, best yank our license" that should be a case of fraud and you should be civilly liable, in my opinion
Companies whose products’ licenses permit rugpulls exist because the company wants to have the option to rugpull. If you don’t want to have the rug pulled on you, don’t use products with such licenses.
an agregious example is thirdweb who technically has the product open sourced but is written to not work without an API key and phone home to SAAS to check your API call limit..
It makes me sad becuase I was working on a getting a team together to build a real opensource and free alternative but once they found thirdweb they all got discouraged thinking that no one will understand why our real open product is diffierent
Direct consequence IMO of our failure to popularize good licenses in another concept like fair source that sits in-between open source and closed source. My small non-saas bootstrap company could not survive if it was OSS, but maybe fair source.
> The pair found that while a handful of lesser-known LLMs, such as AllenAI's OLMo and BigScience Workshop + HuggingFace with BloomZ could be considered open, most are not.
It's absolutely wild to think the deranged BigScience RAIL license, under which the Bloom LLM was released, is open in any way shape or form. It has more user-harming restrictions than basically any other LLM license out there.
By that logic, I can call any proprietary program as 'open machine code' or 'open assembly'. If the artifact can't be built or modified easily, then it can't be considered open.
I believe often companies or rather decision makers are afraid of going fully open-source because they invested a lot of money into the product and are afraid some other company uses it, offers it cheaper and ultimately harms the originator.
So even they might believe in open-source they put protections in place that ultimately lock it down and thus make it closed source but trying to keep the impression of being open.
In our journey at AirGradient towards becoming fully open-source hardware (all code and hardware licensed under CC-BY-SA), we had the same concerns but ultimately decided to go full-in and open up everything with an officially approved open-source license.
I believe there are a few important aspects and "protections" that are open-source compatible that help companies protect their investments.
Firstly, requiring Attribution is compatible with open-source and can help companies get a lot of visibility and competitors probably don't want to attribute another company and thus are often not likely to clone.
Secondly, using a share-alike license also makes it unattractive for many other companies using the code.
Lastly, I believe the code itself is often not the valuable part compared to the brand value, employees, reputation, business model, network and implicit knowledge that a company builds up.
It really worked for us to go that way with a true open-source license and I hope many others will do it too.
There are already some easy to understand licenses like CC in place and I do hope that they also create awareness around "open washing".
At the same time Facebook is doing some of the best efforts for open-AI, so it's a bit hard to blame them. They are not perfect but they still spent and shared the most important artifact that was created out of dozens of millions of USD spent (or even more), though not the dataset, but it is really a major advance forward.
I attended the referenced talk by Dan Lorenc in Alpharetta this week. It was very interesting. He hammered on how many licenses flunk the OSI test despite claiming to be open source.
It’s easy. They’re draining the phrase “open source” of meaning while gaining by marketing themselves that way. It’s fraudulent but also just exploitative.
Article commenter points out that Meta is a funder of the OSI. We'll see if that affects how the OSI defines "open" AI models.
I find it funny how OpenAI was only indirectly mentioned. Still, I'm glad that this columnist is taking a principled stance by arguing aginst one of the more borderline cases.
> The Open Source Initiative (OSI) spells it out in the Open Source Definition, and Llama 3's license – with clauses on litigation and branding – flunks it on several grounds.
Anyone know specifically what he is talking about here?
The only things I'm seeing that I would consider to be clauses on litigation are one that terminates your license if you sue them claiming Llama 3 or its output violates your IP, and the have a choice of venue and choice of forum clause.
Several OSI approved licenses have "terminate on patent suit" clauses. Llama 3 is termination on IP suit rather than just on patent suit but I don't see anything in the OSD where that would make a difference.
There's stuff about trademarks, which I assume are the branding clauses he mentions. But I don't see anything obvious on the OSD that such clauses violate.
> If, on the Meta Llama 3 version release date, the monthly active users of the products or services made available by or for Licensee, or Licensee’s affiliates, is greater than 700 million monthly active users in the preceding calendar month, you must request a license from Meta, which Meta may grant to you in its sole discretion, and you are not authorized to exercise any of the rights under this Agreement unless or until Meta otherwise expressly grants you such rights.
This seems harmless... until you ask what happens if you start a startup on top of Llama 3, do really well and later try to get acquired by one of the companies that had more than 700m active users on that date (Apple, Microsoft, Google etc)
> You will not use the Llama Materials or any output or results of the Llama Materials to improve any other large language model (excluding Meta Llama 3 or derivative works thereof).
That's a pretty huge restriction on ways you can use the models. The language "to improve any other large language model" is also incredibly vague.
> (B) prominently display “Built with Meta Llama 3” on a related website, user interface, blogpost, about page, or product documentation. If you use the Llama Materials to create, train, fine tune, or otherwise improve an AI model, which is distributed or made available, you shall also include “Llama 3” at the beginning of any such AI model name.
I love this one, it means that if you fine-tune a model for erotic furry fan fiction you HAVE to call it "Llama 3 Erotic Furry Fan Fiction Writer" or similar.
> You will not use the Llama Materials or any output or results of the Llama Materials to improve any other large language model (excluding Meta Llama 3 or derivative works thereof).
How exactly would they know if I do?
Also, it doesn't make any
sense that they trained this model using whatever stuff they could download from the Internet but we somehow could bot do the same with their models.
But they said "several grounds" in the article. Isn't that enough? Why would you expect them to explain exactly where and how? A license is just a vibe anyway, it's the spirit that's important.
I did check for myself. And failed to find anything in the clauses on litigation or branding that obviously violated anything in OSI's Open Software Definition (OSD).
Hence, the question.
Simonw's response points out some unusual clauses, and at least one of them looks like it might go against one of the requirements in the OSD but it is not a litigation or branding clause and the article specifically called out the litigation and branding clauses.
“open source” doesn't only mean what the “open source community” has memed it to mean
personally, I think the term should be avoided if its not what the open source community has made a culture around
but I cant say its weasely corporate “open washing”, either. because its the open source community that appropriated the term to mean a subset of free, open, commercial use licenses and everything digital thats necessary to replicate the product, not the other way around where corporations are suddenly using some legalese to turn it into a marketing term thats technically okay
(102) Software and data, including models, released under a free and open-source licence that allows them to be openly shared and where users can freely access, use, modify and redistribute them or modified versions thereof, can contribute to research and innovation in the market and can provide significant growth opportunities for the Union economy. General-purpose AI models released under free and open-source licences should be considered to ensure high levels of transparency and openness if their parameters, including the weights, the information on the model architecture, and the information on model usage are made publicly available. *The licence should be considered to be free and open-source also when it allows users to run, copy, distribute, study, change and improve software and data, including models under the condition that the original provider of the model is credited, the identical or comparable terms of distribution are respected.*
(103) Free and open-source AI components covers the software and data, including models and general-purpose AI models, tools, services or processes of an AI system. Free and open-source AI components can be provided through different channels, including their development on open repositories. For the purposes of this Regulation, AI components that are provided against a price or otherwise monetised, including through the provision of technical support or other services, including through a software platform, related to the AI component, or the use of personal data for reasons other than exclusively for improving the security, compatibility or interoperability of the software, with the exception of transactions between microenterprises, should not benefit from the exceptions provided to free and open-source AI components. The fact of making AI components available through open repositories should not, in itself, constitute a monetisation.
(104) The providers of general-purpose AI models that are released under a free and open-source licence, and whose parameters, including the weights, the information on the model architecture, and the information on model usage, are made publicly available should be subject to exceptions as regards the transparency-related requirements imposed on general-purpose AI models, unless they can be considered to present a systemic risk, in which case the circumstance that the model is transparent and accompanied by an open-source license should not be considered to be a sufficient reason to exclude compliance with the obligations under this Regulation. In any case, given that the release of general-purpose AI models under free and open-source licence does not necessarily reveal substantial information on the data set used for the training or fine-tuning of the model and on how compliance of copyright law was thereby ensured, the exception provided for general-purpose AI models from compliance with the transparency-related requirements should not concern the obligation to produce a summary about the content used for model training and the obligation to put in place a policy to comply with Union copyright law, in particular to identify and comply with the reservation of rights pursuant to Article 4(3) of Directive (EU) 2019/790 of the European Parliament and of the Council (40).
---
In the articles open-source is expressly referred to as release under an open-soruce license (see definition in recitals above):
---
[Article 2: Scope]
12. This Regulation does not apply to AI systems released under free and open-source licences, unless they are placed on the market or put into service as high-risk AI systems or as an AI system that falls under Article 5 or 50.
[Article 25: Responsibilities along the AI value chain]
4. The provider of a high-risk AI system and the third party that supplies an AI system, tools, services, components, or processes that are used or integrated in a high-risk AI system shall, by written agreement, specify the necessary information, capabilities, technical access and other assistance based on the generally acknowledged state of the art, in order to enable the provider of the high-risk AI system to fully comply with the obligations set out in this Regulation. This paragraph shall not apply to third parties making accessible to the public tools, services, processes, or components, other than general-purpose AI models, under a free and open-source licence.
[Article 54: Authorised representatives of providers of general-purpose AI models]
6. The obligation set out in this Article shall not apply to providers of general-purpose AI models that are released under a free and open-source licence that allows for the access, usage, modification, and distribution of the model, and whose parameters, including the weights, the information on the model architecture, and the information on model usage, are made publicly available, unless the general-purpose AI models present systemic risks.
And internally done in an attempt to get someone with AI resources to build blatantly non-AI functions by sticking then onto something with no or very little genuine AI angle.
To be fair, any products that rely on "AI" in naming or advertising is "washed" in some way -- AI is simply a marketing term, not a technical one. Especially considering that it covers so many different (albeit related) things -- LLMs, image generation, computer vision, machine learning etc -- that it became completely void of any useful meaning.
It's just a very wide category that is basically meaningless nowadays because it applies to everything.
Marketers are even restrained in using it, because applying it everywhere it could go would sound insane and cringe. But it is a technical term, that technically applies to all those things people put it on.
Cue the several weasels who regularly turn up, arguing that “Open Source” can mean whatever they say it means, since they don’t accept the OSI definition.
It’s different when it’s a regular word, used for ages. However, the term “Open Source” (as applied to software) was created by the OSI to explicitly mean exactly the OSI definition, no more, no less. The OSI definition was based on the Debian Free Software Guidelines, which Debian had to write because, IIRC, at the time not even the FSF had a strict definition of what constituted free software, and Debian needed some lines to be drawn in order to know what they did and did not want to distribute on Debian CD:s. Claiming something is “Open Source” but not OSI-approved is like claiming something is “legal” just because you personally think it’s acceptable, even when the actual law does not agree. Some terms come with strict definitions.
> the term “Open Source” (as applied to software) was created by the OSI
This is historical revisionism, and it's especially terrible that you'd call people "weasels" for correcting it. The term "open source" (as applied to software) was in-use prior to the existence of the OSI, and that's explicitly why the OSI wasn't able to obtain a trademark on the term. The term meant something roughly equivalent to how we use "source available" today.
Even if we would assume that everything you say is true, does this make it reasonable to claim, today, that we can call something “Open Source” if it isn’t OSI-approved? No, I would say that, today, “Open Source” is what OSI says it is. Only weasels try to claim otherwise; i.e. the only people I see doing it are weasels who are trying to defend the indefensible by arguing the definition of words.
> Even if we would assume that everything you say is true
It's not my blog post, so you don't have to take my word for it. But if you disagree with that post author's findings, perhaps you could indicate what you disagree with. The post extensively links citations/sources.
> does this make it reasonable to claim, today, that we can call something “Open Source” if it isn’t OSI-approved?
OSI isn't the boss of me, and I see no reason to let them dictate the meaning of terminology that they didn't invent and don't hold a trademark for. The two main founders of OSI also haven't been involved with it for quite some time, and besides, one of them regularly makes politically-charged comments that I find repulsive. Why exactly are we putting this random small non-profit on a pedestal?
Personally, I stick to "source available" when referring to non-OSI licenses, but that's strictly to avoid getting shouted at by people who inexplicably treat the OSD like a holy law from the almighty. I think the industry would be a lot healthier if we avoided these extreme views.
> Only weasels try to claim otherwise; i.e. the only people I see doing it are weasels who are trying to defend the indefensible by arguing the definition of words.
It sure sounds like you think anyone who disagrees with your point of view is a weasel, even if they have a well-researched reason for the disagreement!
Even if there were some usage of “Open Source” prior to the OSI, the OSI announcement and consequent media storm completely obliterated any prior meaning the term might have had.
> OSI isn't the boss of me
What reason can you have to use the term “Open Source” today to mean something other than what OSI defines, other than to weaselly mislead?
I certainly don't recall a "media storm" about the OSI in 1998, and I was a hobbyist DOS software programmer who was active on BBSes, newsgroups, and web forums at the time. I'm sure it was a bigger deal among the Linux and BSD and Free Software folks, but that was a relatively small niche at the time, compared to the software industry as a whole.
In any case, OSI didn't create this term, they were denied a trademark on it, and they didn't actually create any of these licenses. They aren't the government, and they aren't the dictionary. The default state of affairs is to not care about their definition. You need to give reasons why we should care about their arbitrary definition. A "media storm" is just a marketing campaign, and that hardly seems like a good reason for an entire industry to strictly follow something.
I'm especially disturbed by how some strict adherents of the OSD seem to go out of their way to attack non-FOSS licenses and the software authors who adopt them. The negative comments on the recent Fair Source threads here seem to indicate that any attempt to adopt non-OSD source-available licensing will be attacked as "open washing", even if the authors don't use the word "open".
I don't understand why people feel justified to act as self-appointed Terminology Police on behalf of an arbitrary definition from a random small nonprofit. This is cult-like behavior.
The OSI announcment was riding the wave of the Netscape source code release announcement, and the wave even grew somewhat. Everybody wanted to talk about this new phenomenon, and now they had a good term to go with it: Open Source. The media coverage, as I recall, was huge. Not an ad campaign; actual media coverage.
There can be less genetic differences between people from (for example) Asia and Africa vs actual family members.
But feel like the intent of the word was very much not vague. It was initially just about the optical differences between people that were born in different areas of the world - which are/were very easy to discern ~150 yrs ago when it extremely rare for "interracial" offspring.
But that's mostly down to people being people and not caring about these differences at large. Give it a few hundred years and these physical differences such as skin color, average body sizes etc will have gone away, too. At least I think they will.
So I feel that this is a great example of another word that's become so charged with political meaning that the origins meaning has been lost or at least changed along the way. And it continues to lose it with every kid that has parents of varying origin
Say you're deciding between two programs (or AI models)[0], you prefer an open source one, a colleague prefers one that just pretends to be open. You say your choice is preferable because it's open, he says the same about his choice. Then you say the dreaded "well, actually" and either you sound like a fundamentalist or an asshole.
[0]: None of those are truly open source because they're all trained on stolen data. And see? Now I sound like a fundamentalist.