Hacker News new | past | comments | ask | show | jobs | submit login
Apple releases eight small AI language models aimed at on-device use (arstechnica.com)
182 points by MBCook 8 months ago | hide | past | favorite | 61 comments



As is often the case, the article gets the difference between open source and open weights wrong.

Also, they don't seem to understand that Apple's variant of the MIT licence is GPL compatible.

> This is Apple's variant of MIT. They've added wording around patents, which is why it gets it own shortname. Apple did not give this license a name, so we've named it "Apple MIT License". It is free and GPL compatible.

https://fedoraproject.org/wiki/Licensing/Apple_MIT_License


The lack of distinction is confusing but Apple have released both the training source and the inference source + models/weights at the same time — so at least both are true.


Is the training data public as well?


Yes

> RedPajama-V2 is an open dataset for training large language models. The dataset includes over 100B text documents coming from 84 CommonCrawl snapshots and processed using the CCNet pipeline.

https://github.com/togethercomputer/RedPajama-Data

> Dolma Dataset: an open dataset of 3 trillion tokens from a diverse mix of web content, academic publications, code, books, and encyclopedic materials.

https://allenai.github.io/dolma/


This time Apple could say that they didn't release it for privacy reasons and this time it would be correct for once.


The concept of open-source for a million-dollar scale LLM is not very useful, especially if you don't provide the training set as well.

Open weights with a permissive license is much more useful, especially for small and midsize companies.


Publicly available datasets were used.

> our release includes the complete framework for training and evaluation of the language model on publicly available datasets, including training logs, multiple checkpoints, and pre-training configurations

https://arxiv.org/abs/2404.14619v1


The distinction between the two terms is what is useful.

Imagine if people only referred to open source software as "free software", and there's no distinction made whether the software is free as in beer or free as in freedom.


Maybe both are useful? There's "open source" in the FSF sense, which isn't always as useful when talking about modern neural networks (remember when papers used to be published with "open source" Python code for initializing and training the model, but no training data or weights?) Then there's "in the spirit of open source" where the weights and training data are also GPL'd. And there's the whole range in between.

Having the training data available is nice, but for large models, having weights provided under a GPL-style license (or even a MIT-style permissive license) is far better in terms of "being able to modify the program" than having the training data that you don't have enough compute resources to use. The distinction between the two, though, is also useful.

(I've even seen a few posters here argue that it's not really 'free software' even if everything's provided and GPL'd, if it would take more compute to re-train than they have available, which frankly I think is silly. Free-as-in-freedom software was never guaranteed to be cheap to build or run, and nobody owes you CPU time.)


Of course both things are useful in practice, and unless you’re a free-as-in-freedom software purist, free-as-in-beer software is also very useful!

But the point is exactly that the distinction matters, and conflating the terms doesn’t do either thing a favor (it also doesn’t really work well with “free software”, since the beer trope is needed to explain what you mean. “Libre” is at least non ambiguous).

Having the training data is not useful just for retraining, but also to know what the model can reasonably answer in 0-shot, to make sure it is evaluated on things it hasn’t seen before during pretraining (e.g. winograd schemas), to estimate biases or problems in the training data… if all you have is the weights, these tasks are much harder!


>Maybe both are useful? There's "open source" in the FSF sense,

FSF uses the term free/libre software and they do not like the vague term of open source. The problem is that in English language free also means costs zero and you can get confused with frewware.

This is why OP is correct in pointing out wrong term usage, you should not call a GPL software Freeware or a freeware acall it "Free software", it is wrong and causes confusion,

So for models would be great we could know if a model is actually open source, open weights with no restrictions or open weights with restrictions on top.


It's hard to read sarcasm online, but as far as I can tell, very few people realise there's a difference!


Linux is libre software. Facebook is gratis.


Also many HN commenters don’t get the distinction between open source and open weights… not very surprising that ars technica also doesn’t


I’m shocked that Ars got this wrong and is helping Apple openwash their work. But I’m seeing this same utterly basic mistake with other tech media as well, like Venture Beat:

https://venturebeat.com/ai/apple-releases-openelm-small-open...


There's no open washing — Apple have released the code (both training and inference), the weights and a research paper explaining. It's about as open as you could expect from them.


I didn’t see training source code, but more like a promise to share insights. Am I missing something? Regardless another reason this isn’t open source is because it used a proprietary license instead of an OSI approved open source license.


I love this term, openwashing. It stumbles off the tongue.


It's invention of terms like this that make it hard for people to take some activist groups seriously...


Why such a negative take? Open washing is similar to green washing, a practice where companies pretend to be environmentally friendly for marketing purposes. To me the term makes immediate sense. What am I missing?


Openwashthepolice


Any alternative suggestions? Its similar to green washing and other such practices.


Apple is going exactly where I predicted they’d go. They’ve already had machine learning hardware on their phones for multiple generations, mostly “just” doing on-device image categorisation

Now they’re moving into more generic LLMs, most likely supercharging a fully local Siri in either this year’s or next year’s iOS release

And I'm predicting that there will be an opportunity for 3rd party developers to plug in to said LLM to provide extra data released during WWDC. So you can go "Hey Siri, what are the current news on CNN" -> CNN app provides data to Siri-language model in a standard format and it tells what's going on.


The secret sauce will most likely be tight integration with Shortcuts, enabling the language interface to drive and automate existing apps.


I really hope that Shortcuts gets a UX overhaul. It feels so painful to write and test new shortcuts.


"play an hourly chime, oh and by the way remind me to get coffee when i'm on the way home tomorrow" no ux beats that but text/voice.


Shortcuts is getting to a point where I'd prefer to just write actual code instead of fighting with the stupid Scratch-style coding blocks.


That sounds a lot like AppleScript... https://en.wikipedia.org/wiki/AppleScript

I wonder how hard it would be for Apple to rebuild Shortcuts around an AppleScript backend to allow power users the ability to edit the scripts directly.


Huh, They used the pile - that's a pretty interesting choice for a corporate research team?


There are many variants of the pile at this point, including ones with copyrighted content removed. They are probably using one of those, like:

https://huggingface.co/datasets/monology/pile-uncopyrighted


I don’t see any source for that from their paper. It just says a duplicate version of Pile with a citation linking to the original Pile paper itself.

[15] Leo Gao, Stella Biderman, Sid Black, Laurence Golding, Travis Hoppe, Charles Foster, Jason Phang, Horace He, An-ish Thite, Noa Nabeshima, et al. The pile: An 800gb dataset of diverse text for language modeling. arXiv preprint arXiv:2101.00027, 2020.


Oh that’s really great to know about, thank you!



I hope somebody will take the time to incorporate a model like these into keyboard typing prediction soon.


I switched from Android to iOS and back to Android because the keyboard typing predictions on iOS were so bad in comparison. It felt like I was using a phone from 10 years ago.


Did the same switch (Android -> iOS), but found SwiftKey[0] is also available on iOS. Better (or more Android:isch?) typing experience.

They also solve the issue for us using multiple languages on a daily basis (first and second mother tongue + English), without having to cycle through all the iOS built-in keyboards just because you want to mix in a few English words/phrases.

[0]: https://apps.apple.com/us/app/microsoft-swiftkey-ai-keyboard...


I thought iOS 17 was using on-device GPT-2 for keyboard predictions.


It’s using something, I don’t know what. It’s a MASSIVE improvement from the continuous slow degradation that ended in 16.x.


Dozens of languages don’t have that. My language doesn’t even have prediction, only broken autocorrect.


This sounds like it is fully reproducible: "it also included reproducible training recipes that allow the weights (neural network files) to be replicated, which is unusual for a major tech company so far"

Digging into the code, is this the best definition of the datasets used? https://github.com/apple/corenet/blob/0333b1fbb29c31809663c4...

And the paper says the instruction tuning is based on UltraFeedback, this config seems to say exactly in what form: https://github.com/apple/corenet/blob/0333b1fbb29c31809663c4...


"Small Language Models" for on-device use. Neat.

> The eight OpenELM models come in two flavors: four as "pretrained" (basically a raw, next-token version of the model) and four as instruction-tuned (fine-tuned for instruction following, which is more ideal for developing AI assistants and chatbots)


Does anyone know wether this will be on ollama? Is it a matter of someone caring enough to implement that or is there a technical limitation?


https://github.com/ollama/ollama/issues/3910

Currently not supported in llama.cpp, if someone has the time to check out OpenELM it probably will be implemented.


The Safetensors instructions https://github.com/ollama/ollama/blob/main/docs/import.md should work.


I find it fascinating how late Apple is and how they always release some low quality products with a twist.

They really give me Nintendo vibes. They can't compete on hardware, software, etc... So they put their resources into weird things that let their fanatical users say:

"Apple has the best X"

Since no one actually cares to have the best X, its nearly useless, but... its great for marketing and post purchase rationalization.


> They can't compete on hardware

Every iPhone generation is the most powerful phone on the market.


No aux port though, cheappie phone. Also its completely unusable for my purposes due to Apple's poor security record.

I have important secrets.


If they don't fix siri in the next year, I really should drop iOS. It's sad.

Why cant I use my device to talk to GPT4 from the lock screen?


You can map ChatGPT to the action button and talk to it from the lock screen.


Because running GPT4 for hundreds of millions iOS users is not an easy task - especially if there is no subscription model behind it.


I subscribe to GPT-4, there is no reason I shouldn't be able to replace siri beyond Apple's fear of not making future profits.


> there is no reason I shouldn't be able

ios is a walled garden since day 1 it's hardly surprising and I doubt it'll change any time soon


EU is working on it.


I would be curious how much less computationally expensive these models are. Full-blown LLMs are overkill for most of the things I do with them. Does running them affect battery life of mobile devices in a major way? This could actually end up saving a ton of electricity. (Or maybe induce even more demand...)


It probably helps that Apple Silicon has dedicated die space to the Neural Engine - essentially a TPU. No good for training, great for inference.


I’ve been reading up on this recently but devs say ANE is kinda a pain in the ass to leverage; most OSS is using gpu instead


these most likely aren't using the Neural Engine

the ANE seemed to be optimised for small vision models like you might run on an iPhone a couple of years ago

these will be running on the GPU


I bet these can all run on ANE. I’ve run gpt2-xl 1.5B on ANE [1] and WhisperKit [2] also runs larger models on it.

The smaller ones (1.1B and below) will be usably fast and with quantization I suspect the 3B one will be as well. GPU will still be faster but power for speed is the trade-off currently.

[1] 7 tokens/sec https://x.com/flat/status/1719696073751400637 [2] https://www.takeargmax.com/blog/whisperkit


indeed, but probably not as written currently?

i.e they would need converting with e.g. your work in more-ane-transformers


I wonder if there could be a tiered system where a "dumber" LLM fields your requests but passes them on to a smarter LLM only if it finds it confidence level below some threshold.


Did anyone try to test any of them?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: