This is very cool, and I was reading the paper when I noticed they call it an open-source model, but wouldn't a better term be "open-weight model", since to make the model weights you'd need the document sources and lots of compute? Or did they actually open source it fully so people can pull the same sources in to build the same weights and I didn't get the memo?
I find it confusing to release weights under a source code license too as Apache 2.0 that is used for plenty of open models spends a lot of time talking about code, which should be unrelated to weights (I am not a lawyer though). But perhaps one should look at weights as "firmware" necessary to run a model? I am not sure.
Although I am getting tired to keep mentioning it at this point, LLaMa is not open by any reasonable definition though. It is available. But as a sibling points out, it is rude of Facebook to call it something it blatantly is not, even if it is more open than say Clos^WOpenAI's offerings. In addition, it is very rude towards the people behind Mistral, GPT-J, etc. who do honour the tradition of open science and open source to place your work both in your papers and marketing next to them. There is a great word to describe what Facebook is doing: appropriation.
Neither is releasing LLaMa the way that Facebook does honouring their "commitment to open science" as its license also violates the principles of open science. This was objectively false when they stated it six or so months ago and even after OSI and many others calling them out they still have not adjusted their messaging and I am frankly considering them bad faith actors at this point.
The best argument in favour of Facebook doing what they are doing is that their model is more open than the closed ones. Which is fair (pun not intended). But, calling your dog a cat because it is more akin to a cat than a fish is still misleading and rightfully surprises and upsets cat lovers, even if dogs are fine pets in their own right.
Many clearly think of the weights as a source-like artifact.
Also, the model's architecture is sufficiently documented, & supported by open code, for others to fine-tune it & run it for generation, which to me justifies describing the model as "open", even if the entire process for creating a new set of high-quality weights isn't. (That's in contrast to "Open"AI's GPT4, about which many architectural details are undisclosed.)
Note also: as I understand it, there's so much paralellism, algorithmic randomization, & even floating-point-implementation-instability in how such models are trained that even having the exact same training corpus wouldn't be enough to ensure the same final weights. That would require both Facebook & reproducers to do every calculation, in every stage of prepration, in identical order on identical hardware - a constraint that'd make typical parallel/distributed training optimizations (subject to all sorts of CPU/IO/network jitter) impossible.
I suspect a lot of people like me are feeling quite a bit "nerd sniped" while trying to read the paper. I am just suggesting open-weights are more fit term than the open source term I saw when trying to read the paper, I was not talking about EXACT reproducible builds, which I agree are probably a stretch with current tech (but obviously not impossible with new tech but possibly slower and not obviously needed anyway). I also know about OpenLLaMA and others but that's not Meta's work, and it's not the work in the paper. I'm pretty sure a lot of the core principles Llama uses are drawn from OpenAI's published research but I agree that there is a lack of openness recently, e.g people having to speculate about even core things like how they presumably use mixture of experts, but "open source" is not the right term for them just because OpenAI is not being open right now. I agree mostly that the model itself is somewhat open but I really think we should consider "open weights" as a much better term, I think it's much more reasonable than saying open source.
> Many clearly think of the weights as a source-like artifact.
I believe releasing the weights is necessary for an open-source AI project, but don’t consider the weights to be source.
Indeed, open source is not just about the code. For instance, the OSI definition[0] states:
> The license must allow modifications and derived works
If a company released source code for a project, but it was written in a new language whose compiler they held private, then making derived works would be extremely hard. Someone would have to reverse engineer how the language works from source examples, and reimplement a compiler with a similar performance. In that case, in my view, while the code was open, the project would not be open-source.
The same is true of AI models: people can certainly reverse-engineer the training code and replicate a training run to get the weights, but without the weights, the project is not open-source.
On the flip side, if the inference code is open and has the weights public but the training code is not, it is similar to an open-source project using a proprietary compiler (think C# before 2014). The project is open-source, but the training is not.
While I like the Open Source Initiative's early & principled stake-in-the-ground, as a matter of usage, many things get casually called 'open source' that don't fully fit the OSI 'Open Source Definition'.
And, with regard to the Llama models, it seems to me that all the actual computer-language "source code" to run & train them is available. The specific objection of the grandparent post, with regard to the non-availability of the full training-data document corpus of non-source-code-text, isn't clearly in violation of the OSI's 10-point definition.
There is of course a different problem, a part of the Llama's licensing that does clearly violate the OSI Open Source Definition: its "Additional Commericial Terms" preventing just Meta's biggest competitors from using it – discrimination against persons or groups.
"Rude" is a strangely chosen word here. Rude towards whom? I could maybe accept "misleading", but as a person who don't believe OSI has an ownership of the term "open source", I don't think it's misleading either.
I used the word "rude" carefully here, because I'm not confident that I can make a case for it being illegal or even necessarily misleading (though I personally think it is) - but I'm happy to declare it "rude", partly because rudeness is in the eye of the beholder so I get to determine if I think something is rude or not myself.