As is often the case, the article gets the difference between open source and open weights wrong.
Also, they don't seem to understand that Apple's variant of the MIT licence is GPL compatible.
> This is Apple's variant of MIT. They've added wording around patents, which is why it gets it own shortname. Apple did not give this license a name, so we've named it "Apple MIT License". It is free and GPL compatible.
The lack of distinction is confusing but Apple have released both the training source and the inference source + models/weights at the same time — so at least both are true.
> RedPajama-V2 is an open dataset for training large language models. The dataset includes over 100B text documents coming from 84 CommonCrawl snapshots and processed using the CCNet pipeline.
> Dolma Dataset: an open dataset of 3 trillion tokens from a diverse mix of web content, academic publications, code, books, and encyclopedic materials.
> our release includes the complete framework for training and evaluation of the language model on publicly available datasets, including training logs, multiple checkpoints, and pre-training configurations
The distinction between the two terms is what is useful.
Imagine if people only referred to open source software as "free software", and there's no distinction made whether the software is free as in beer or free as in freedom.
Maybe both are useful? There's "open source" in the FSF sense, which isn't always as useful when talking about modern neural networks (remember when papers used to be published with "open source" Python code for initializing and training the model, but no training data or weights?) Then there's "in the spirit of open source" where the weights and training data are also GPL'd. And there's the whole range in between.
Having the training data available is nice, but for large models, having weights provided under a GPL-style license (or even a MIT-style permissive license) is far better in terms of "being able to modify the program" than having the training data that you don't have enough compute resources to use. The distinction between the two, though, is also useful.
(I've even seen a few posters here argue that it's not really 'free software' even if everything's provided and GPL'd, if it would take more compute to re-train than they have available, which frankly I think is silly. Free-as-in-freedom software was never guaranteed to be cheap to build or run, and nobody owes you CPU time.)
Of course both things are useful in practice, and unless you’re a free-as-in-freedom software purist, free-as-in-beer software is also very useful!
But the point is exactly that the distinction matters, and conflating the terms doesn’t do either thing a favor (it also doesn’t really work well with “free software”, since the beer trope is needed to explain what you mean. “Libre” is at least non ambiguous).
Having the training data is not useful just for retraining, but also to know what the model can reasonably answer in 0-shot, to make sure it is evaluated on things it hasn’t seen before during pretraining (e.g. winograd schemas), to estimate biases or problems in the training data… if all you have is the weights, these tasks are much harder!
>Maybe both are useful? There's "open source" in the FSF sense,
FSF uses the term free/libre software and they do not like the vague term of open source. The problem is that in English language free also means costs zero and you can get confused with frewware.
This is why OP is correct in pointing out wrong term usage, you should not call a GPL software Freeware or a freeware acall it "Free software", it is wrong and causes confusion,
So for models would be great we could know if a model is actually open source, open weights with no restrictions or open weights with restrictions on top.
I’m shocked that Ars got this wrong and is helping Apple openwash their work. But I’m seeing this same utterly basic mistake with other tech media as well, like Venture Beat:
There's no open washing — Apple have released the code (both training and inference), the weights and a research paper explaining. It's about as open as you could expect from them.
I didn’t see training source code, but more like a promise to share insights. Am I missing something? Regardless another reason this isn’t open source is because it used a proprietary license instead of an OSI approved open source license.
Why such a negative take? Open washing is similar to green washing, a practice where companies pretend to be environmentally friendly for marketing purposes. To me the term makes immediate sense. What am I missing?
Apple is going exactly where I predicted they’d go. They’ve already had machine learning hardware on their phones for multiple generations, mostly “just” doing on-device image categorisation
Now they’re moving into more generic LLMs, most likely supercharging a fully local Siri in either this year’s or next year’s iOS release
And I'm predicting that there will be an opportunity for 3rd party developers to plug in to said LLM to provide extra data released during WWDC. So you can go "Hey Siri, what are the current news on CNN" -> CNN app provides data to Siri-language model in a standard format and it tells what's going on.
I wonder how hard it would be for Apple to rebuild Shortcuts around an AppleScript backend to allow power users the ability to edit the scripts directly.
I don’t see any source for that from their paper. It just says a duplicate version of Pile with a citation linking to the original Pile paper itself.
[15] Leo Gao, Stella Biderman, Sid Black, Laurence Golding, Travis Hoppe, Charles Foster, Jason Phang, Horace He, An-ish Thite, Noa Nabeshima, et al. The pile: An 800gb dataset of diverse text for language modeling. arXiv preprint arXiv:2101.00027, 2020.
I switched from Android to iOS and back to Android because the keyboard typing predictions on iOS were so bad in comparison. It felt like I was using a phone from 10 years ago.
Did the same switch (Android -> iOS), but found SwiftKey[0] is also available on iOS. Better (or more Android:isch?) typing experience.
They also solve the issue for us using multiple languages on a daily basis (first and second mother tongue + English), without having to cycle through all the iOS built-in keyboards just because you want to mix in a few English words/phrases.
This sounds like it is fully reproducible: "it also included reproducible training recipes that allow the weights (neural network files) to be replicated, which is unusual for a major tech company so far"
> The eight OpenELM models come in two flavors: four as "pretrained" (basically a raw, next-token version of the model) and four as instruction-tuned (fine-tuned for instruction following, which is more ideal for developing AI assistants and chatbots)
I find it fascinating how late Apple is and how they always release some low quality products with a twist.
They really give me Nintendo vibes. They can't compete on hardware, software, etc... So they put their resources into weird things that let their fanatical users say:
"Apple has the best X"
Since no one actually cares to have the best X, its nearly useless, but... its great for marketing and post purchase rationalization.
I would be curious how much less computationally expensive these models are. Full-blown LLMs are overkill for most of the things I do with them. Does running them affect battery life of mobile devices in a major way? This could actually end up saving a ton of electricity. (Or maybe induce even more demand...)
I bet these can all run on ANE. I’ve run gpt2-xl 1.5B on ANE [1] and WhisperKit [2] also runs larger models on it.
The smaller ones (1.1B and below) will be usably fast and with quantization I suspect the 3B one will be as well. GPU will still be faster but power for speed is the trade-off currently.
I wonder if there could be a tiered system where a "dumber" LLM fields your requests but passes them on to a smarter LLM only if it finds it confidence level below some threshold.
Also, they don't seem to understand that Apple's variant of the MIT licence is GPL compatible.
> This is Apple's variant of MIT. They've added wording around patents, which is why it gets it own shortname. Apple did not give this license a name, so we've named it "Apple MIT License". It is free and GPL compatible.
https://fedoraproject.org/wiki/Licensing/Apple_MIT_License