i'd say within 5 years apple will have optimized apple silicon and their tech, along with language model improvements, such that you will be able to get gpt-4 level performance in the iPhone 19 with inference happening entirely locally.
openai is doing great work and is serious competition, but I think many underestimate big tech. once they're properly motivated they'll catch up quick. I think we can agree that openai is a sufficient motivator.
I really think you have hit the nail on the head here. Apple has a ridiculous, almost unfathomably deep moat for training and running personalised, customised LLMs and other AI models on the 'edge' with these Apple Silicon chips in all their devices.
We must be talking orders of magnitude differences in operational cost, not to mention completely unique features like privacy. The very definition of disruption, waiting in the wings.
Not only that, but this model of 'wait in the wings and pounce when the tech is really there and absolutely nail it' is Apple's forté, aligned completely with the cultural and strategic DNA of the company.
> Apple has a ridiculous, almost unfathomably deep moat for training and running personalised, customised LLMs and other AI models on the 'edge' with these Apple Silicon chips in all their devices.
Do they? I can completely fathom, given my own anecdotal experiences, how garbage Siri quality is today. I haven't built anything against Siri APIs, but I've used Siri and various integrations and every single time I give her a shot, she disappoints me. It can do cookie-cutter, super well-traveled code paths that were engineered together for demos, basically just one off tricks without cohesion, but the whole infrastructure around Siri is not open and has not been improved in any way in many OS versions in any way that I've discerned... But I am not an expert with these systems or APIs.
The Neural Engine architecture especially considering M1/M2 hardware leaps (which are absolutely mind-bogglingly impressive) are both really superb technologies from my layman view as a mere software engineer, with improved battery life as well as performance, a real improvement over Intel x86 architectures - I just don't see Apple as a serious AI player right now, despite head start in AI w/ Siri acq, despite these hardware leaps that may make it easier to do cool stuff in the future like this post. The OCR stuff in Photos is cool and useful and I use it every day, usually to look up my Known Traveler Number.
Ramblings over, just wanted to ask you to opine as to Apple LLM related information if you have any additional context!
- siri is incredibly underinvested. They have another team that's building some sort of search and natural language processing engine, that has slowly sapped away some key headcount from the siri team.
- apple doesn't get the full advantage of tons of user data from the wild. this is both a bug and a feature
- the siri api model is clearly generations old, and i suspect that apple has been marshaling its resources into a big leap forward - alongside their hand tracking, ar, and hardware
- apple has shipped everything required for you to point at a light /in your house/ to turn it on or off (and optionally flick). this includes software - the individual components are built and ready - the only thing missing is gluing it together
There's something happening, and I think that the rumored glasses are the hardware totem.
Lots of things with Apple come out of nowhere. In this case there's actually quite a bit more evidence than usual because of all the fielded, underutilized hardware that's ready to go— when it launches, it'll likely be available on millions of devices, rather than as something you have to get the latest hardware to have, as is the case when the camera or screen undergoes a drastic improvement, or there's a new feature like wireless charging or NFC payments.
>because Apple are probably working on a top secret new thing that will blow everyone else out of the water”
that's a mischaracterization of what i said. what i said was that they have been clearly hiring key positions and cannibalizing the siri team for something new. that something new will likely be a major release. i also believe that the ar headset is the unifying product under which they're rallying.
You may well be right, but if so that's a huge problem for Apple as by all accounts the AR device is still many years from being ready for release and meanwhile they are in a very precarious situation with Siri...
The first proper integration of a Whisper & a GPT-class LM will be a big step change, a Siri-like AI that you can actually talk to somewhat sensibly and expect it to "understand" more than pre-set phrases... Google could be in a position to release something for Android, and their are Android SoCs with neural accelerators as well (sure nothing as impressive as Apple's chips but Samsung have demonstrated a cut-down SD model on their SoC).
I don't think the new siri is intrinsicly tied to the device. I think that as part of remaking siri, they're using the new platform as a rallying cry and story as to why it's important. (it's hard to justify features such as making siri spatially aware otherwise)
Yes, it's been stagnant for so very long that I, and I suspect most users, have learned the very few things that it can do reliably, and no longer even try to do something more sophisticated. If it became more capable, it might take a long while for most users to even notice, because we have stopped hoping it will pleasantly surprise us.
Most people didn’t notice, but dictation now runs in a hybrid mode — both on device and via server. So you get some (maybe poor) results at first (only results if offline but better than nothing) that get rewritten after a couple seconds from server.
I'm not a VR/AR fan but if Apple can come out with super high quality AR headset that can augment/replace monitors I would buy one instantly. I might be earlier than mainstream adopter but if the tech ever gets there this will be a natural shift.
I won't buy Meta products but I keep waiting for someone to do VR workspace right - where I can truly work and travel from anywhere with a decent chair and a desk for keyboard (no need to lug around monitors, no need for huge table).
I think that's it. I had half an hour to kill in a cafe yesterday, didn't bring my kindle, and I thought "if there were compact AR or even VR glasses that I could whip out to emulate a full-size working environment, that would be super handy right now".
Having said that, for me to spend money on it it would (a) also be able to realistically replace my set up at home, which currently consists of 2 HiDPI screens and (b) not be that much more expensive than what that cost me.
I would not love wearing them. I waste a significant amount of money on contact lenses each year for the privilege of not having to wear anything on my face.
There is a difference between the subjective and objective "cool." They've sold something like 150 million pairs of Airpods? I think that qualifies as objectively "cool."
> apple has shipped everything required for you to point at a light /in your house/ to turn it on or off (and optionally flick). this includes software - the individual components are built and ready - the only thing missing is gluing it together
Could you expand on this bit? I’m pretty deep in the Apple ecosystem, but I’m not sure what you’re referencing here
Homepods have U1 chips. They can position airtags within your house. You can put an airtag under, or adjacent to, a lamp and link it to a smart switch. Your apple watch has gesture detection (still in beta, tbf), as well as U1 and real time positioning capabilities.
Everything required for apple to know not only where you are, but which way your hand is pointing, as well as where "smart devices" are in your house is already being sold and rolled out en masse to the majority of people in the ecosystem.
$25 for an Airtag, plus say $20 for a smart switch. Plus $200 for a HomePod.
To turn a single lamp on/off by pointing at it, something that nobody ever wants to do (alright, once for the cool factor). If someone wanted to overpay for useless features, they can already go for a Philips Hue.
I can turn off my lamps from anywhere in the world using a $10 Tuta ZigBee bridge and a $8 LIDL light.
Meanwhile you could use an iphone as a webcam for years by just buying an app,you could use an ipad as a cintiq by just buying an app, using your phone with your computer with kdeconnect also has been around for years. Pretty much nobody cared. Then apple came out with their branded version and suddenly you have people praising it as the best thing since sliced bread and great "innovation".
So my prediction would be that apple releases a ZigBee bridge and light for $50 in about 3 years and calls it a fancy name. People will tell you how absolutely essential and innovative it is and how you could never get that without an iDevice.
I think that this is one of those things that sounds like you'd never do it, but...
I've put zigbee switches into every wall switch in my house and now when I stay in hotels, I forget to turn off the lights before getting into bed.
>something that nobody ever wants to do
I think that this gesture, if it works well, is something that will be a killer app for smarthomes. The other point of friction, which is interop, has been more or less solved by thread/matter. Homepods are $100, btw. Not to mention what happens if apple integrates u1 into a smart switch/bulb, which is something that the thread protocol allows for.
Not OP, but I suspect this is a simple combination of AR and precise device location awareness. The former comes in the form of ARKit. I’m not aware of a framework for locating devices, but it wouldn’t surprise me if there were one.
As an mere user this is what I had assumed. Siri has been stagnant and buggy for years; it makes sense they are working on a big replacement instead of putting resource into incremntal improvements on what seems to be a weak foundation.
> apple doesn't get the full advantage of tons of user data from the wild.
Do you think OpenAI did? Perhaps Apple doesn't have the vast amount of data that Google has, but if OpenAI managed to have like 200 wikipedia-sized corpuses of different textual data for their English GPT models, that's certainly not out of reach of Apple.
> > Apple has a ridiculous, almost unfathomably deep moat for training and running [LLMs]...
> Do they? I can completely fathom, given my own anecdotal experiences, how garbage Siri quality is today.
Siri is a dead end. When Jobs bought Siri (what, 10 years ago?) he explicitly junked almost all the AI back end, mainly buying the speech recognition engine. I didn't understand why and still don't (but strangely he didn't ask me :-).
John Giannandrea has run Apple's AI effort for the past five or six years. He is the reason Google has a big AI effort (he consolidated a bunch of AI projects and bought Deep Mind, etc) before he decamped for Apple. For all I know the Siri team isn't even part of his remit.
You can never look into Apple (even if you work there) so one can only speculate based on what visible signs appear. But Siri isn't one of them.
The cinema industry know how to squeeze a billion dollar a year out of Apple. Do the Apple or Sony ”social engineers“ have total view? When Amazon was sky rocketing and Blue Origin had not reached orbit a niche producer from the cinema industry had Jeff B paying a lot although he might have backed out of that deal like giving up on the Android phone with a lot of cameras.
The part that Siri is bad at will be commoditized — someone will open-source a GPT-4-level language model. And Apple's moat will be being able to run that on-device with all the attendant benefits (privacy, zero marginal cost to the company, availability in more scenarios, etc)
You realize that these language models are like 100's of GBs in size and consumes 10's of GB's of memory. Last time I checked, apple still ships their products with less than the market average in both of these specs. If you want a local running LLM on an iphone, get ready to sell a kidney.
You can today run an LLM vastly better than Siri on a few GB of RAM using Llama 7B at 4-bit quantization and alpaca.cpp. This is moving so fast, every day there is something new coming. There won't be any moat in LLMs soon or even in dedicated HW as it turns out you don't need that much for "basic intelligence".
Note I'm not suggesting you can pack the full knowledgebase of humanity into those 2GB of RAM, but the key feature of an edge AI is simply to understand instructions, something Siri and Ok Google struggle with at best..
(assuming we're not talking about the near future)
I think this can be a scenario of converging incentives: on one side large models will incentivized hardware manufacturers to increase the memory available on the devices, while on the other sides model developers will be incentivized to trim the fat on the models and devise compression mechanisms that don't compromise quality too much.
It's not unthinkable to imagine a hand held device able to run full inference locally a few device generations in the future.
yeah, makes sense. At least these integrations at a lower level are happening now. I just can't help but feel disappointed that we didn't see earlier cohesion at least a few years ago, maybe around the early or pre-M1 era when there were all those A7, A8 whatever SoCs. But in retrospect, that is basically 10 minutes ago in the history of the universe so yeah, I'm just overly excitable!
Alexa is DIMENSIONS better than Siri. Siri can create a timer ... okay even an alarm. That’s it.
It is comically bad. Their text to speech is excellent, but the rest is unbelievable bad.
If Apple has some kind of silver bullet, it’s time to put it out or be left behind.
I don’t have all the details because NDAs but I ask questions and get nods and grins from colleagues and friends at chip companies; Apple (and nVidia and Intels) silver bullet to cloud hosted, software based AI is AI chips.
We’re circling back around to local compute being the default as hardware performance of next gen phones and tablets reaches a “good enough” point for most users.
There will be scientific problems that will require modern server clusters but most consumer facing AI needs will be done on hardware within a decade.
I’m not saying AGI in a decade, I’m saying cutting edge logic embedded in software now will be the basis for logic in chips in the years to come.
Honestly I observe zero progress in the last 10 years. My first iPhone was 4S. My current iPhone is 8. My use-case is absolutely the same. Calls, SMS, Whatsapp, navigation, books, browser. The same with macs. I boot, open IDE, Terminal and that's about it. Yes, hardware's getting faster (which is compensated by software getting slower). But nothing changed dramatically.
I think that the only thing that changed is adoption of fingerprint reader. Both in iPhone 8 and in mac.
AI might be that thing that could tremendously change my usage pattern. But it should be as smart as human assistant. GPT4-level intelligence might have the necessary power for that.
I don’t think Apple necessarily needs to lead the pack on LLM research. They just need to take the SOTA FOSS model and bake it into silicon, using their chip design expertise.
If you do that, you can do inference on mobile devices, which is a huge privacy win; it plus into their general privacy positioning in a big way. If you open up that SDK to developers it would be the iOS app gold rush all over again.
Note, Google is also moving in the direction of on-device inference with Coral, but they obviously will want to transmit the personalized model weights back to the mothership.
Honestly, it speaks mountains to Apple as an organisation that they did not invest all of their resources into Siri, especially when you consider how Amazon invested in Alexa etc. Everyone has long understood that the "personal assistant" represents really the pinnacle of not just what AI but the personal computer should be able to offer -- the mistake this past decade was thinking that had to do with then current ML paradigms or with these completely useless voice agents. That Apple could realise it was a dead-end and not sink much more investment/engineering resources into the tool any further than what anyone actually uses these tools for (changing song, lowering volume, etc) is outstanding.
Ultimately, I'm very reassured that it's Apple and Microsoft leading the way for personal and business AI assistants respectively. As Stratechery has emphasised, the latter not only already has all the data from most firms outside Silicon Valley, but more importantly has always adhered to and designed their software according to a computer-as-a-tool philosophy versus Google's "ML/AI will do everything for you behind the scenes"; the former has not only made a show of data security but invested massively in the hardware architectures necessary to making on-device AI a possibility. Without at all being a "fanboy", Apple is quite literally the only company I would trust my personal data to for use in a GPT-based assistant.
Amazon's investment in Alexa is a perfect example of why Apple didn't (and shouldn't have) invested all of their resources into Siri. The Alexa team is getting seriously gutted in these rounds of downsizing [1], or as CNBC puts it "the team behind the technology was a prime target of the largest layoffs in the company’s history."
What they have been invested heavily in is the Apple Neural Engine ("ANE"), special silicon right on the SoC to handle ML / AI code. Optimize on a server, then run the model on your iPhone or probably soon, your Apple Watch.
WWDC this year is going to be very, very important.
Siri is garbage because it's trying to be a real AI, as it was understood to be at the time. Alexa works because it's not an AI but it seems as if it's an AI.
If you’re the exec in charge of this at Apple, what are you doing right now. It’s not watching OpenAI become what Siri could be unless you have something to do about it. Your motivation levels are extremely high.
Maybe, but Apple doesn’t have a search to rival Google (or even an assistant, given the state of Siri).
Focusing on privacy and on-device learning is great, but when the strength of these models is in consuming all the data they can hoover up your motive is at odds with your philosophy.
Apple gets $18B a year from Google to be the default search engine on Apple devices. Why would Apple build a search engine that would probably be inferior and not be able to monetize it?
I only use Siri with the original Homepods. It has always been terrible. You cannot even tell it to start playing movies on the TV that you have already purchased from Apple itself in the TV app. It will start to play some random song or something.
I should be able to say hey siri, start playing <name of media> on <name of Apple TV>, and it should be able to start the TV and start playing it.
If you have the magic phrase to make it work, I'd like to know. I tried for about five minutes while driving yesterday, to no avail.
For me, Siri is like an 80s text adventure game, except I was better at those.
Edit: just now I was able to call by interacting a second time with Siri, and using the physical button. Using the button never would have occurred to me while driving.
The perfectly obvious use cases Siri overlooks are maddeningly numerous. My go to example was "text this photo to <contact name>" For years I checked each new iOS release to see if that was enabled, and for close to ten years, no. I'm not sure whether it was the most recent, or the one before (I gave up somewhere along the way) but now you can (finally) do it.
I work around this by asking it to find Toyota in <my city>. Then if it finds the dealership, it asks if I want to navigate there, or call them.
I find the worst errors to be when I ask it for information, or to send a text, and it instead places a phone call to someone. It will even call people that I have never called on my phone (a fact it should know), without asking first.
And AppleTV itself is hot garbage for this kind of stuff even though it's a veritable supercomputer that should be the hub for Siri and home automation.
- Hey Siri, <do something that involves AppleTV in any capacity>.
And that has nothing to do with an LLM. That has to do with speech to text algorithms. It also doesn’t take advanced AI to do a simple intents based answers.
They have every query their users enter into Spotlight. They have all the results it returns (who cares if they originated from Google, ok maybe they do, maybe not using it for anything is part of their deal with Google). They have your contacts, calendars, messages, email, music, workout history, tasks, real time location, which Siri already uses to e.g. recommend a destination when you get in your car to a degree of accuracy that straight spooks me sometimes.
Apple has access to SUBSTANTIALLY more fine-grained personalized data than Google does. Full stop. That's a weird thing to assert in the tech crowd, but think about it deeply and you'll realize its true.
Google has islands of services that they've done an extremely good job of building bridges between. They have some islands that Apple has nothing like (YouTube is the biggest one) (Search is a huge island, but only indefensible if the interfaces customers use to access it don't leak the queries and results; no one opens a browser and types "www.google.com" on mobile and even if they did, Apple controls the browser renderer, they could lift everything if they wanted to, they won't, I'm simply illustrating how much Apple should fuckin scare Google). But the waters between those islands are patrolled by Samsung, Xiaomi, Oppo, and Vivo, and there's sharks in there as well (uh, the metaphor is falling apart, the sharks are "Google's tumultuous history with privacy and user blowback if they overstep").
Apple is a continent. Many of their cities are ports that connect to some of Google's islands, but Google's product still has to be checked by customs.
You're right that even given that power, hoovering it up is at-odds with Apple's philosophy. Or, is it? I think "hoovering it unencrypted to the cloud" definitely is; but that's the point OP was making: if we're extremely close, as a species, to solving "make AI work", one of the challenges for the next five years is inevitably going to be "make it more personal". Its awesome that I can ask ChatGPT to write a date comparison function. It'll also be awesome if I could ask Siri "when did sarah and I talk about getting a cat" or something.
That requires personalized data to set the context. If Apple can swing at the fences and say "everything is on-device, nothing leaves, we can't see it, and now you can ask Siri that, and by the way there's new functionality built-in to iOS that apps can leverage to integrate with Siri's new LLM capabilities just like ChatGPT plugins"; that's an extremely compelling product. Extremely. And I know they would ingest that data, that they would do that, because they already do! Go ask Siri to call Sarah, or if you have any meetings tomorrow (assuming you're using Apple Calendar), and it will respond. I don't know where you're getting this take that they don't "hoover data"; they ingest everything from all their first party apps into Siri's `DATA MATRIX`. Its just, you know, 2010s era querying and data crunching.
Don't get me wrong, Google's gonna do a lot of this too. But Google is playing from the position of "we have a model, we have the data, we just need to pay hundreds of millions of dollars a year maintaining all these servers". Apple is playing from the position "our customers are paying us for the silicon to run this, we have the data, we just have to figure out the model". If its not obvious at this point: the models/algorithms/etc are not a moat. They're going to be commoditized with time. Publicly accessible data isn't a moat. Personal data is a moat, because people care about privacy. And cost effective training and inference silicon is a moat, because its Physical, and like literally One Company on the planet makes it, and they're in a big time situationship with Tim Cook.
Yes you do. It trivially does. Search for "Burger King" and see how many results that generates which couldn't possibly be served locally (undownloaded Apps from the App Store, Maps, search recommendations, etc).
They may not be storing these queries or using them for further analytics, training, etc. But they could.
TSMC manufactures the silicon for nearly all of the current generation high end silicon. The only exceptions I've been able to find are, naturally, Intel's CPU lineup (which has nothing to do with AI) (and, to be clear, Intel Arc is also manufactured by TSMC), and the Snapdragon 8cxg3, which is manufactured by Samsung; and the "advanced-ness" of the 8cxg3 is pretty miserable compared to A16, M2, RTX 4000, RDNA3, or Tensor Processing Units.
Apple has purchased the entirety of TSMC's 3nm production for the next 1 to 2 years. They can do that, and can continue to do that, because they have a ton of money. They have a ton of money for buying chips because there isn't some crazy business middle-logic justifying the cost of these chips' performance; they buy chips, they sell chips to consumers. In comparison, literally zero other customers of TSMC derive the majority of their hardware revenue from selling to consumers. Companies like Nvidia make some money this way, but most of their money goes to data center sales, which has their own business justification for buying them which fluctuates (ChatGPT subscriptions? training models? is the past model good enough? etc).
I think this is wrong because you can always build a search engine - it’s not hard - and you can supplement the learned models with cryptographically secured private data to build a private personalized model. In fact this seems in all ways superior to any alternative, and strikes at the heart of the Google spy ware ad spam business model.
And seriously, did you really try to assert the Google whatever it is has outperformed Siri? I’d love to meet someone that uses it so I can verify this statement. But with the advent of Chatgpt Alexa, Siri, and Google whatever all look like freshman projects. The next generation of assistants have just begun, and I assure you, the state of the past has nothing to do with what comes next. OpenAI hit a giant reset button across an awful lot of stuff.
I have both and find that 90% of the time my commands are “remind me ___”, “wake me up at ___”, “what’s the weather today”, and “play ___ on Spotify”. For these, I can’t tell the difference between google and Siri in terms of quality (but Alexa was worse).
I was just setting reminders today and had Siri repeatedly fail to do what I want. The only use case I have for it is setting timers, which is does very well. I'd use it for much more if it was as good as ChatGPT.
This is entirely my point. They’re all crap at a fundamental level, none of them outperform any other. But the next generation, which will assuredly be backed by a LLM, will be shockingly powerful.
The MKBHD personality is onto Monte Carlo testing in his device reviews. He has an intuition for good UIX. Google and Apple should pay people like MKBHD and his production crew a billion dollars every year to film achievable, desirable text, voice, camera A.I. assisted use-cases.
They don’t work like that though. Instead they spend $3b / year on hiring fungible cog SDE 2 job code 67483’s and look at gantt chart roadmaps produced by fungible cog PM 3 job code 74842’s for fungible cog SDM 7 job code 84747’s with the nominal assistance of centralized (I.e., marginalized) fungible cog UX 5 job code 35563’s then wonder why it’s perennially late and complete garbage. Except no one wonders that because fungible cogs don’t have wonder in their job responsibly matrix.
Yeah, Apple is slower to adopt some things but I noticed when they hit it...they knock the ball not just out of the park...they elevate the entire game.
I dominated a 13" mbp m1 for like two solid years on hardcore startup software engineering. The base model. It was almost entirely perfect until the last few months when I upgraded to the new 16".
These are alien devices. They're just not possible yet here I am using one.
Siri was supposed to become locally processed but sometimes can’t even set a timer because of a connection time-out. Or she’ll use her fancy ML speech recognition model to turn “set a timer for 3 minutes 10 seconds” into “search for trinity tensor”.
Siri is so bad it needs to be scrapped and rethought from the ground up (it can’t even give me the time when internet is down at my rural house). No one at Apple who wants a career there would dare propose that. The switch to a LLM architecture could be the perfect transition point for this.
If Siri can't give you the time without the internet, I think you need to update iOS.
There are definitely two tiers of Siri queries. There are queries like "set the brightness to 10%" or "set a timer for 5 minutes" which absolutely and consistently work without internet, and have for several years, and if you're legitimately having a different experience then its possible a cosmic ray hit your iPhone (or, realistically, you're running into a strange and rare bug which is not indicative of the general experience and will be fixed in three to five years or maybe never). There are also queries like "create a new note" (from the Notes app) which should be able to work offline, but don't. And, naturally, there are queries like "when did Resident Evil 4 come out" which wouldn't reasonably work without internet (but, if you're curious, she does get it right).
In other words, Siri clearly does some local guessing as to whether she can answer a query without the internet, and some queries appear to be miscategorized into the second bucket. My leading theory on why this happens, which may be incorrect, but it seems like: if an app has any Siri functionality which requires the internet to answer, all of the queries which are responded to by that app have to require the internet. It doesn't matter where the processing ends up actually happening to respond to the query, it just shuts the query down. Its weird, but its consistent with the behavior I've seen.
The more important point: Siri's real and weird limitations don't seem to have much to do with limitations in local processing. They've said that the speech interpretation all happens on-device. They encrypt practically all of your data that does get shipped to their servers, so a query like "Open the note titled 'Hello World'" is probably also being processed on-device. But: Siri still requires the internet for that query. That doesn't seem like a significant limitation with their ML algorithms or silicon or anything meaningful; it seems like just a case of dumb coding, which is certainly something Apple is no stranger to.
Exactly. You can see Apple struggle to build anything of quality with their software stack. Good luck making anything from this. So much great HW sadly wasted.
In addition, Apple is notoriously slower at adopting the most cutting edge technologies but historically they tend to consider things very thoughtfully and rolls them out in beautiful launches (not always).
Examples would be the fact that Samsung almost always beats them to the punch on camera technology and other whiz-bang features, but Apple eventually adopts to much mass market consumer acclaim and groans from Android techies. Another example is they’re just now considering touch screen on laptops.
A counter example is the TouchBar - which was “innovative” but many didn’t like.
> Apple has a ridiculous, almost unfathomably deep moat for.. running personalised...
Apple has decent hw but no LLM software. Unfortunately, in LLM space software changes are the ones driving performance for now, since the space hasn't stabilized yet. Since their competitors control the software, they get to adapt it to their hardware. That is, Google and Microsoft are going to adapt GPT/Bard to Qualcomm ARM etc. while Apple is being ignored. Unless Apple gets in with their own LLM (quite possible), their hw advantage will end up not mattering one bit.
They might not even require the need to train. There are many open implementations of models that don’t have an academic requirement. Right now you can use tensorflow.js which runs on the gpu using webgl of a phone. If you build the API out developers will use it.
iCloud has screamed ineptitude for decades. I don't see how Apple can possibly develop successful products that can't be held in the hand and fetishized by myopic designers. Much like a stereotypical grandparent raised during the Great Depression, Tim Cook still leads like a supply chain miser trying to save a company that has long prospered -- they tragically squander the unprecedented capital at their disposal. Change my mind, Apple please.
OpenAI's relationship with Microsoft pretty much makes it big tech. I think you have a point as long Apple keeps innovating its hardware, but it's not really a David vs Goliath situation.
Nvidia is also no David to Apple’s Goliath. They are both Goliaths who will most likely be battling to build the best hardware for LLMs in the near future.
There may be a short term panic to get position of CUDA GPUs which will allow NVDA to increase their prices a lot. But then other chip makers and software creator will build alternatives to NVDA
Look at AMD vs NVIDIA RTX implementations.
Nah, NVDA is a solid buy for long term. They also almost acquired ARM (Which shows where their heads are at at the very least.)
Owning ARM, the thing that is inside ya know, everyone's phones, routers, etc
You can already run stuff like it on mobiles, for example alpaca.cpp with Llama models run fine on the CPU on any device that has a few gigs of RAM. Getting to even larger models using weight quantization, distilling etc it's not 5 years, it's 5 weeks or maybe 5 months at most..
Btw transformers are really simple and the optimisations making them run fast on CPUs have come a long way. I don't know the M1/M2 benchmarks for this, but many CPUs for edge devices have NN accelerators on the silicon that can run this, it's the pairing of the accelerator and gigs of RAM that is the key.
Greed is a fabulous motivator but it can't surpass the laws of physics. This is probably why we don't have a cure for cancer or room temperature superconductors or jurassic parks.
It may be so that scaling down a language model like GPT 4 is not possible on hardware systems orders of magnitude smaller than the one used by OpenAI.
I'm not saying it's impossible but it's fallacious to just assume outright that it's inevitable because market forces.
For all we know it could turn out that the only way to get a gpt in a pocket format is through some form of analog chips that must be individually trained that Hinton aptly described as an era of mortal computing.
There is nothing impossible about a Jurassic park in terms of laws of physics though. It's a matter of getting the right materials to recreate dinosaurs.
Same goes for cancer cures. I'm not sure if there are physical limitations preventing a room temp superconductor.
> I'm not sure if there are physical limitations preventing a room temp superconductor.
Not really. We just know that it’s very uncommon at best. The limit is material dependent and we already have superconductors with more than one order of magnitude difference in their critical temperature (e.g. ~4 K vs ~40 K; YBCO, which is widely studied, is at 90 K). It is not inconceivable that we could come up with some fancy material with a critical temperature three times as high again. There are several laboratories with good money working on it.
This would be awesome. I don't like having to rely on closed-source cloud services for LLMs.
But what incentive does Big Tech have? Especially considering that presumably they could monetize the cloud services easier. It's not like most consumers care either.
I expect smaller companies like Stable Diffusion and grant/govt-funded research to bring cheap, local inference. There's definitely a lot of demand. But is there more demand than cloud-based services, so that it's economically viable for the biggest companies (Apple)?
i'd say apple is one of the few companies that could charge people for an "AI embedded" 128GB module, allow people to pay for a subscription in order to "access", yet have the inference happen locally.
one hypothetical is that you need an icloud subscription, a token is retrieved from apple, the token "unlocks" your AI module, on the phone and allows you to do the inference.
in this way apple could charge monthly for this and claim that the inference happens locally. sadly in this was it was similar to the whole csam debacle
apple is also one of the few companies that could realistically get manufacturers to massively produce a hypothetical "AI model chip" that has the model on device, at a quantity that would make it realistic to pay for in a hypothetical iPhone 19 Pro model.
apple fitness I imagine has a pretty small opex (sure they pay trainers, but that doesn't scale linearly with subscriptions. the videos themselves are limited in scope and could be cached pretty trivially. the actual fitness tracking happens on device), same with apple arcade which I highly doubt has any additional expensive over the app store in general.
I can see it - "Siri+", pay $5 a month for fine tuned model upgrades straight to your device and remote fallback. local inference available for Pro devices only.
If you look at Peloton’s financial statements, you’ll see that your assumptions are off. For one thing, music costs are significant (Peloton had to settle a $200M law suit for not doing it right).
In addition to the trainers, you have producers, graphic designers, artist spotlights, celebrity interviews, etc.
Yes, some costs are fixed so with more subs you get higher profitability but that’s really different than charging for something that doesn’t have a variable cost.
You don't need a subscription to track and view your data (I've got an apple watch and no subscription). I think all the subscription gets you are the training courses (videos, but also live events and stuff)
my point was more around how you need apple fitness + subscription to do relatively trivial things like show your health metrics on a TV (which you also need an apple TV for).
OpenAI IS big tech at this point given the massive financial and infrastructure investment from Microsoft. I think we're well past the small underdog narrative.
Apple M series chips are already some of the best chips for running your own models locally, especially if you want a laptop or small form factor machine instead of a big gaming desktop with a big GPU.
Apple is really getting a niche here for machines to run models locally. That’s pretty powerful.
> Apple is really getting a niche here for machines to run models locally.
I... don't agree. They have an acceleration API and a large install-base, but all of these models have run just fine on traditional hardware. GPT-Neo, Stable Diffusion and LLaMa can all be used and accelerated without Apple Silicon.
Powerful, maybe. But not really unique, just putting up table stakes.
Honestly, I've looked at shorting MS due to openai.
Their technology (LLMs), or the secret sauce, can easily be stolen just by the process of putting that tech out there.
Have a look at Alpaca, FB made it, someone leaked the weights and now there's a dataset of training it for only a few hundred dollars that can beat openai at its best.
Not everyone needs to employ a PhD for doing customer service, in the same way not everyone needs GPT5 for answering support queries.
TBH alpaca is nowhere near to chatgpt or even gpt models in its capability. I have been playing around with lama for a while now and even chain of prompts, the quality of answers are really bad.
I had thought that Facebook would be having the ability and competence to produce a LLM which can outclass open ai
I also believe this is going to the case. The past decade apple has positioned itself as a privacy focused + good hardware company. The next is going to be offline AI + privacy + good hardware & I am already sold to the idea.
Agreed. Apple is fantastic at waiting for the moment technology to be mature enough to be useful to everyday people and jumping on it. They have a fantastic CPU department that no one else can match, and certainly must have a skunkworks AI team. They’ll let everyone else make the mistakes, learn from them, and release a great product.
I'm similarly skeptical, but that said I'm running 30B parameter LLMs on my 32GB M1 Macbook Pro every day now. The trick is quantising them down to 4 (or even 3) bit, it's possible to massively reduce the memory requirements. Have a look at[1]
The devs working on llama.cpp have been discussing ways to further reduce the memory requirements by mmapping the large weights files (I thought LLMs mutated the weights as they run inference, but they clearly know more than me about the internals), bringing it within reach of phone memory.
So, iPhones are not as far off the computational capacity to run these models as you'd think. Memory (and to a greater extent, battery and cooling) are the limiting factors. iPads even less so, given they run M1 chips and have much larger batteries & much more RAM
Offtopic, but for what purpose are you running llms locally (especially everyday)? My understanding was that the prompting requires to make them work at all was too great.
A little bit of research, a little bit of actual useful tasks - I'm interested in summarisation, which alpaca is decent at (even compared to existing summarisation-specific models I've tried)
My other motivation is making sure I understand what offline LLMs can do... while I use GPT-3 and 4 extensively, I don't want to send something over the wire if I don't have to (e.g. if I can summarise e-mails locally, I'd rather do that than send them to OpenAI).
It's also surprisingly good at defining things if I'm somewhere with no internet connectivity and want to look something up (although obviously that's not really what it's good at & hallucination risks abound)
On alpaca, I've found "Below is an instruction that describes a task. Write a response that appropriately completes the request. Summarise the following text: " or "Give me a 5 word summary of the following: " to work fairly well using the 30B weights.
It's certainly nowhere close to the quality of OpenAI summarisation, just better than what I previously had locally (e.g. in summarising a family history project with transcripts of old letters, gpt-3.5-turbo was able to accurately read between the lines summarising an original poem which I found amazing).
I half wonder if the change in spelling from US -> UK makes a difference...
I'd run a test on that but I've just broken my alpaca setup for longer prompts (switched to use mainline llama.cpp, which required a model conversion & some code changes, and it's no longer allocating enough memory)
Off topic slightly, but are you running into limits with 32GB RAM that the 64GB model would meaningfully be adequate for? Do you wish you had one of the larger RAM models?
I've been pretty happy with 32GB, but the 30B models do push near to the limits. I don't see a big difference between the quality of 65B (running on a 64GB x86 host) and 30B on M1 (although that may be the 4bit quantisation though, so take that with a grain of salt). I'm just glad that I have it on an M1... I have a 3080 in my PC, but when I got that I was thinking more of Stable Diffusion and YOLO tasks rather than LLMs, and it just doesn't have the VRAM for LLMs.
Alpaca seems like it could be significantly improved with better training (some of the old training data was truncated), so I think there's a decent amount of improvement to be had at the current model size.
In the future though... what would really be a meaningful change would be a larger context size - the 8k tokens of GPT-4 was a big improvement for my uses... I would guess a future local llm with larger context would exceed 32GB, but that's speculation beyond my expertise, I don't know how context size and network size scale.
If it was a PC I'd say go for 64GB, but hard to recommend that given how much Apple charge for RAM upgrades. On my next upgrade (2+ years time, hopefully) I'll likely opt for 64GB+ though
Yeah, it is expensive. My other strong consideration is battery life, since DRAM is always running; going from 32 to 64 would be a hit to battery life regardless of workload, but hard to say exactly how big of a hit.
I'm curious, which configuration of the M1 MBP do you have?
I went for the 16" with M1 Max w/32 GPU cores and 1TB SSD (500GB free, I offload most large files my NAS/iCloud). On the added power usage, my understanding is that's less of a concern due to using LPDDR5?
The only drawback I've found with the M1 Max model is the added weight from the bigger heatsink just makes it a hair heavier than I'd like when picking it up at the front with one hand when open... and that in the winter time the case is cold no matter what you're running, I used to love that my Intel MBP acted as a mini leg warmer :-)
yes, there are billions of parameters necessary. but large language models only came out about 5 years ago. I'm confident 5 years from now the parameters necessary to get gpt-4 performance will be decreased orders of magnitude.
at the very least, even if that's not the case, inference will be drastically less gpu heavy by then I suspect.
There will also be hardware improvements (as always) and ASIC chips specifically designed for running this kind of model. For example, see this "Optical Transformers" paper [0] and its HN discussion [1] from last month.
I could also imagine a sort of two-tier approach, where the on-device model can handle the majority of queries, but recognize when it should pass the query on to a larger model running in the cloud.
This is an older paper, but DeepMind alleges in their Chinchilla paper that far better performance can be extracted with fewer parameters; quote
"We find that current large language models are significantly under-trained, a consequence of the recent focus on scaling language models whilst keeping the amount of training data constant."
It's difficult to evaluate a LLM's performance as it's all qualitative, but Meta's LLaMA has been doing quite well, at even 13B parameters.
Chinchilla is aimed at finding a cost-performance tradeoff as well, not the optimal amount of training. If cost is no barrier because it'll be used forever, then probably there's no amount of training that's good enough.
The rumor I've heard is that GPT4 didn't meaningfully increase the parameter count versus GPT3.5, but instead focused on training and structural improvements.
that's a complicated question to answer. what I'd say is that more parameters makes the model more robust, but there are diminishing returns. optimizations are under way
Could you explain how supporting multiple languages increases the parameter count so much? I'm genuinely curious.
LLMs seem to be comfortable with hundreds of programming languages, DSLs and application specific syntaxes so how does supporting a couple more natural languages become so expensive?
I see how more training data would be needed, but I don't understand how that maps to a greater parameter count.
Hundreds of GBs of RAM in a phone is just ~6 years away if Moore's law holds. It's also likely that memory requirements will be shrunk through software and ML improvements.
It has been increasing. In iPhones went from 128MB to 6GB, and there are Android phones with 18GB now.
RAM hasn't been increasing as steeply as it could, but if there's a strong use-case for it, it may happen. Also consider that Apple is in control of the whole chipset and software, so they could implement things like turning the extra RAM on only during ML computation.
I'm running a totally usable 13b parameters llama model in my macbook air, which seems to give outputs equivalent to what I was getting from GPT3 in June 2022.
How much more hardware would it really be needed for GPT-4 level outputs natively? Perhaps software optimizations alone could do most of the trick.
OpenAI and MSFT couldnt figure it out and are burning milions of USD on compute, but Apple will make it run on phone in 5 years?
Crazy bet, what makes you think that? You cannot optimize infinitely. Raytracing probably had decades of ppl trying to make it run fast and yet even today you need strong hardware
Have a look at what the opensource community is up to. They're reducing models down to the meer billions of nodes and then using chatgpt to train their model accuracy. The results are suprising.
The reason openai have had to rush out plugins is due to software like langchain coming in at meteoric speed.
Things are moving on a day to day basis in the ai sphere at the moment.
They under estimate chip makers plans to embed AI in silicon.
Chips intended for launch in 5-6 years are in planning stages right now. Apple, nVidia, and Intel could bring serious hurt to software companies in the next 5-10.
Open source purists will weep but really most people do not care, and tech should not merely serve the dedicated.
I'm expecting the Mac Pro to have AI/ML massive GPU modules and/or specialized ML dedicated compute available as add-ons for in-house training. For a not-so-small fee of course. They have all the pieces, it just needs to be brought to bear on the ML training market.
The 4080 is _25x_ faster than the M2 on pure fp32 (which is what most GPUs are doing most of the time). Apple compared the M2 to the laptop 4080, using numbers heavily biased to them (running a 4080 at 10W does tend to make it not perform, yes).
Not a single benchmark in the world has supported Apple's claim that the GPU in the M2 is that powerful. It's just yet another cute embedded GPU that does the job, but nothing more. It's made to push out 8K frames really fast, which it does because of UMA, but want demanding task will have it be eaten alive by any real GPU.
ML inference is not generally FP32 anymore. I was going off of the TOPs numbers for ML from a few sources, which generally agree M2 is about 22 TOPS and 4080 (desktop) is about 50.
But in any event, yes, that was my point. UMA is a huge advantage, the GPU itself is too weak to be serious.
But it’s a lot easier to drop a dramatically beefier GPU into a new design than it is to update the entire platform for UMA. Apple has a huge opportunity here… whether tbey pursue it or not remains to be seen.
> whether tbey pursue it or not remains to be seen.
Pursue what though?
UMA is cool, but kinda meaningless if the majority of Macbooks are min-spec. That leaves you with 4-5gb of VRAM, assuming you've left nothing open. What is Apple going to do with that UMA that other manufacturers cannot?
It's certainly nice that 128gb Macs exist for models that might be too big to otherwise load into memory. It's useless for production inferencing though, and I struggle to imagine the "opportunities" they're missing out on here.
A Mac variant that trades CPU cores for GPU/ML cores while having 192GB+ of UMA memory.
> I struggle to imagine the “opportunities”
Two of them: 1) academic / R&D compute, where people could have at least A6000 class GPU on the desktop, and 2) cloud inference servers, probably for Apple’s own services.
I’m not saying they will or should do those things, just that the apple silicon arch is well positioned if they choose to. Bolting on exponentially better GPU is not especially difficult, and they’ve got an OS that would bring existing apps and libraries right over.
Look at it this way: is there a path to UMA on Windows / Linux? If not, those systems will always duplicate RAM and require users to decide in advance whether to allocate RAM budget to OS or ML.
Academics might buy in, but they're a small market and still fairly easy to poach with quality server hardware. You may be right about Apple using them for cloud inferencing though, seeing how they'd rather pound sand than patch up their relationship with Nvidia.
Whichever way you look at it though, neither of those are really opportunities. Apple boxed themselves into the consumer market, and now has to compete with professionally-priced products.
No it won't. Large language models are trained on 1,000 - 50,000 GPUs. No one's going to buy hundreds of Mac pros to mount them in a datacenter for training ML models.
I mean, it's possible to load LLMs onto a smartphone today. The utility is limited though, and if iPhone 19 still has arbitrary application memory requirements then I think it's safe to say they won't be using the most complete models. At the end of the day, OpenAI's "huge-ass LLM as a service" will probably be relevant longer than you think it will be. Local inferencing might be able to do simpler stuff (eg. compose assistant speech) but this is already possible with pruned models and CPU acceleration.
The bull mindset is fun to watch unfold (especially here on HN) but I think people should temper their expectations.
I never said anything about openai not being relevant. the point is scale and good enough. gpt-4 is already good enough for most reasonable use cases, notably siri optimization. 5 years is a long time. 5 years ago llms didn't even exist.
Gpt3.5-turbo runs laps around siri and is cheap, surely Apple will acquire talent to build something that does remote queries and falls back to local for the next 1-2 years while they figure out on device accelerated LLMs
I don't think they'd even need to go that far to get started. Alpaca-LoRA is better than Siri when it comes to understanding queries and can already run on consumer GPUs. I would be really surprised if they couldn't get something equivalent running on the top-series A-chip devices in relatively short order.
People keep saying this. But it’s only half the problem to process natural language in text form. How well would ChatGPT do at understanding Spanish from a non native speaker with a southern US accent? (Raises hand). I’ve yet to see a speech to text system that does well at understanding speech by non native speakers or even native English speakers with a heavy regional accent.
Almost 5 years ago TalkToTransformer did 80% of ChatGPT's job with 0% of the hype. Once people realize just how glacial all this stuff moves, I think the honeymoon phase will be over.
Definitely possible. A cool idea. But that’s also a lot of credit for “big tech”. Meta, Google, Netflix, and others have produced almost nothing cool for a decade. Apple is one of the few behemoths seemingly capable of actually getting any real work done, so you’re right they’ll be a big contender. Smaller players have less capital, but a lot less blockers too. They can take real risks, move faster, and don’t have all the bottlenecks that come with having 50,000 employees.
> Why would they want to give out their trusty LLMs
OpenAI won’t want to; open source competitors already are, and they will keep getting better. The more of a lead OpenAI has over commercial competitors, the more incentive those commercial competitors will have to back open source options.
Running locally and “give you” are two very different things. Financially you paid for the phone. From a lock in point of view the chip will be proprietary and very hard to reverse engineer. They can indeed hide it behind a local API that apps that they can ban at will would need to use.
Different directions - this is the fundamental challenge. It's hard to know what these directions are when we're flying blind at the cutting edge of technology.
We're all human. I suspect out of the 100, you'll have 95 of them going into the same as before areas. Maybe 5 truly understand the above point and explicitly strike down ideas that have been done before.
GPT-3 has been live for a while now. The industry has hundreds of story generator apps specialized for various kinds of content generation. Very few are really thinking about AGI or chain-of-thought reasoning etc as an example.
So I don't think launching 100 of them into different directions would work. Maybe 5 of them with both the expertise and direction to pursue seemingly impossible goals.
> GPT-3 was said to require something like 150gb of VRAM.
By all accounts GPT-3 is wildly inefficient in resource use; OpenAI runs like a company that’s concerned with the functionality it can achieve by calendar date, and has an almost infinite bankroll to do it. But, other actors in the field have different priorities, and the various open source or, at least, available-to-use models seem to be far more efficient than the OpenAI models of similar function (though they are behind the newest OpenAI models in function.)
I gave a try at Alpaca-LoRA with tloen's tuning and it definitely feels in reach of GPT 3. Not quite as good, but some of that may just be in whatever's going on with OpenAI's hidden prompts encouraging lots of text out of the model.
Basically the curtain has been drawn and shows that current LLM’s are just very inefficient and will be optimized in weeks. Whatever improvements Apple’s Neural chip offers just needs more RAM closer to it, which will likely come at the next hardware fresh where this year its probably too late, while whatever is released at the end of 2024 will be good enough.
Definitely, for some years now the iPhone has had AI stuff on board to categorize and organize your photos (recognising people, creatures, "things" etc), all on board and built into the device - unlike Google that requires you to upload them first. It's not a stretch at all to have more functionality like Siri run on-device.
Isn't that quite risky business - if someone manages to leak the model, anyone could run it for free?
I can imagine similar for iPhone on the edge, when someone manages to decrypt the model it will be free to grab for anyone, unless there is going to be some proprietary thing going on only available on Apple Silicon that is undocumented.
That's highly unlikely. I think most people can't fathom how huge is the scale of GPT-4 and what computing power it requires. Even if 10x optimizations, 10x performance and 10x memory (highly unlikely in 5 years) that's not going to be sufficient to run it locally.
I think people are overestimating how many parameters GPT-4 is really trained on.
While also overestimating how many parameters anybody really needs after fine tuning.
The market will find the sweet spot. Right now everyone’s tinkering with the 7B parameter LLM and then going to move up to the 65B one once they've refined the process. I think its fiction that anybody really needs a 10 trillion parameter LLM at all. It will be completely niche.
Fine-tuning is a solved problem using transformer adapters which are fast, small and match regular fine-tuning, so that's not an issue. As for smaller models, their usefulness will quickly wane when they will be seen as producing trivial gibberish in the light of cloud models available in 5 years.
There is no benefit to OpenAI not needing a data centre for their tech. The only ways they have to lock their tech up is safety and ethics theatre and the hardware budget required to run it.
if you work at OpenAI and you’re optimizing anything you’re not doing your job right.
This will be the AI killer app for whoever pulls it off first. GPT-4 or higher performance on a device as nice as an iPhone...will create so many new capabilities for smart phones. It's going to be an amazing time for tech in the near future and beyond.
I’m a huge Apple fanboy, I even moved my family up to the Bay Area to work for them for a time. Hopefully you’re right but if Siri is any indicator for some reason Apple can’t seem to crack AI at the same level as OpenAI has proven to be able to.
it will actually play out differently: Apple will pay Open AI to run their models on Apple Silicon. And Apple will guarantee the model is protected using hw keys. Jail broken phones won’t have access to the local AI
Apple will never have their tech run locally for AI anymore than they allow Siri to search the internet offline or while driving. Apple will only build the client as an html wrapping or inside an app but it will always run on apple servers
Name something that you find groundbreaking and I too can reduce it down to something mundane.
Leaps in performance across all metrics in an existing thing is groundbreaking, especially when it’s just the beginning.
On a practical level, it feels pretty groundbreaking to me when I go back to use my previously top of the line 16” MBP from a year before. I suspect you haven’t had the pleasure of using an M-series computer.
Something like FLAN T5 can't run without top of the line VMs and GPT 4 is an order of magnitude better than that. How will it run on an iPhone, no matter how good it gets.
Local inference is a pipe dream. There's no way any of these companies will allow consumers to interact with their flagship models on anything but a black box api.
The black box could well be a local library. Functionally, what would be the difference between using something like AIKit and using a REST API over the network?
Local has a lot of advantages as well (latency, privacy, etc).
You can’t . That’s the joke. If you use car play and ask Siri to find a local destination or a celebrity birthday, the functionality will be refused since the driver is assumed to be operating the car and thus blocks internet search, even though it will search in Apple Maps which obviously isn’t cached on the device
In regards to LLMs there is a collision between Apple's extremely good chip design capabilities and Apple's insistence that Siri never says anything that isn't 100% scripted and 100% certain to not bad. Up until now, they've chosen to limit Siri functionality rather than leave anything to chance.
LLMs will absolutely be able to run locally, but whether Apple will be able to stop worrying and love the model remains to be seen.
It would seem to be more risky if the model is out on the device and you can’t correct any screw ups without doing an update.
They’ll have to do something about Siri soon though. Even my 5 year old daughter told me Siri is ‘a bit thick’. And that’s just compared to Alexa never I’d ChatGPT level
I find it fascinating that this was released after the June 2022 launch of the M2 chipset and line of products, and yet Apple had no desire to show relative performance of M2 vs. M1 here - even in the simultaneous announcement here: https://machinelearning.apple.com/research/neural-engine-tra...
It's fascinating to me that at least one of two things is true: either (a) Apple has lost its ability to coordinate "hype" between its teams, or (b) the difference between comparable levels e.g. the M1 Max vs. the M2 Max are so negligible that they don't look good in an announcement like this.
Has anyone run inference for LLMs or other transformer models on comparable M1 and M2 Macs? Are there good benchmarks for this specific workload?
>I find it fascinating that this was released after the June 2022 launch of the M2 chipset and line of products
And this was put out in Aug 2022. It's very propable that the team worked on it and tested it just on M1, and M2 was kept under wraps in different teams working on it until the announcement. So they just wrote the annoucement on the CPUs they worked on - and since it's not for a commercial product, Apple didn't care to optimize marketing anyway.
Reminded me of one of the cooler uses of old iphones.[1] These old phones are going to continue being useful long after their initial life as a phone. As long as apple doesn't act like apple and lock everything down.
The bottleneck with compute at the edge is (and will be) model size (both app download time and storage space on device).
Stable Diffusion sits at about 2GB for fp16, Whisper Medium at 1.53GB, LLAMA is 120GB.
Sure, Apple can ship an optimized model (<2-4GB) as part of the OS, but what if a capable app maker wants to ship one? Users will not be happy with an app sized at >1GB.
Hyper casual games are around 300Mb these days, proper AAA games are multiple GB. People still download those, as you can tell by the billions of dollars they make.
The problem with OpenAI's business model is that it's actually quite expensive for them to maintain centralised processing. With Apple, there are billions of very powerful computers deployed to users and these computers mostly stay idle apart from occasionally running some bloated JS to show a button ar something. If Apple manages to run a good enough model on device with acceptable performance and energy impact, then suddenly OpenAI and Microsoft will be just burning away money with no expectation of recouping if they provide the service for free, if they make it paid they will be making money in a niche.
Apple typically solves this with device segmentation.
They can announce an iPhone Pro Ultra model that comes with higher RAM and storage capacity along with a souped up Neural Engine, similar to how they do today with how there are differences between the iPhone and the iPhone Pro screen and camera.
Even better, they could bundle the base model for LLAMA with iOS and ship incremental model updates to those iPhone Ultra users (possibly on a monthly subscription).
I wonder if they’d create a simple version that lives on device that can call on a ‘cleverer’ version for more extensive tasks - like how ChatGPT is using plugins.
Most interactions probably don’t require full power
yeah, some sort of caching. Small models for most tasks with a way bigger models just one get-request away. It's a smart idea that hasn't been tried yet. But I bet that you could shrink the models significantly. It doesn't really have to know so much if it can google, but I we don't really know how to train such models (we just want the "common sense" without so much knowledge).
Yes, by the time iphones can smoothly run llama 30G, the state of the art gpt-x will probably be a terabyte. Skynet will forever live in the cloud with just assistant agents living on the devices.
> Yes, by the time iphones can smoothly run llama 30G, the state of the art gpt-x will probably be a terabyte.
Yeah, but if every individual is running decent chat-capable LLM, and businesses are running their own on their own devices, and those can communicated with each other, who needs to rely on Skynet?
Research developments are already showing that our models are woefully inefficient in their current state (compare the performance of GPT-3 140B against Alpaca 30B). Not only will hardware get better, the minimum model sizes for good inference will become smaller in the future.
Apple is artifically shipping phones with low storage sizes to upsell phones with more space. If there was a big economical advantage for them to have larger models on the phone I expect it would be easy to solve.
It this provides optimised hardware acceleration on Apple Silicon for low level machine learning libraries.
PyTorch is supported for example, it’s a machine learning library with GPU acceleration that’s been around for 6 years now. It’s used in a few commercial projects, including Tesla Autopilot. It can be used for natural language processing, image manipulation, and possibly to build an LLM I suppose, but as a low level library it just gives the base tech to build such systems from.
#2. A way to optimize running LLMs locally on Apple Silicon (including iPhones)
I am just a little better informed. As I understand it, their code improves model performance and memory consumption using PyTorch and Huggingface libraries.
Their examples compared the A* CPU performance and the repo includes Swift only code samples. But they’ve also made it possible to use them with traditional tooling (torch, huggingface).
Convenience will be the biggest factor. Whoever makes it easier for the end consumer to get what they want wins. It's why ChatGPT made such a big splash in comparison to all the other AI models which were also impressive. Local inference may play a part in that if it's quicker but if the last twenty years are anything to go by it will be convenience rather than privacy which is the deciding factor. And Apple do tend to make the UX slicker than other companies so I wouldn't put it past them ending up the biggest player.
> it will be convenience rather than privacy which is the deciding factor
This. Nobody really cares about local processing for privacy.
Even those that claim to often don't mean it. Remember the total freakout over Apple's proposed local, privacy-preserving processing to detect CSAM before uploading it to iCloud? The consensus seemed to be that secret, opaque, and un-auditable cloud-based scanning was much preferable.
However, there was enough of an outcrying by the community and my experts that it was pulled. Therefore end users do care about privacy when they understand the implications.
After all, it's hard to care when you don't have power as an end user too act on those feelings.
Do you think the current regime of Google/etc doing ad hoc scans on cloud storage, with no transparency about who is requesting scans for what, is more private?
It was pretty open — auditable and cryptographically proven sources for all triggering hashes, auditable updates. And it was mutually exclusive with cloud scanning, as it was part of E2EE.
About mutual exclusivity, you may be right. I probably misremember that specific detail, but that was before Apple enabled E2EE on iCloud. It was removed as a feature back then when they suggested local CSAM scanning.
"It was pretty open"
Not really. The scanning code wasn't open source. Apple didn't explain how their proprietary hashing algorithm worked, let alone providing any source code. It was just the promise of Apple about how the whole thing would work. If that's sufficient, then iCloud scanning is as open as local scanning.
Auditability of DB updates doesn't mean much either. Even if a third party organization detected "new additions" to the CSAM DB, what would be the next step? There would be no way to verify if those hashes actually correspond to CSAM. A dictatorship, for example, would just say "yes it's CSAM, trust us", and you'd have no option but to trust their word. Even in the US, there is no way to verify if CSAM DB fully corresponds to actual CSAM. It's just NCMEC's word. We simply don't know.
It's a reference implementation, other people are supposed to pick it up and apply the techniques to their own work
I would love to see some links to any where it has been used?
So far the most promising codebases for running LLMs on Mac have been the 'cpp' reimplementations, which ditch Pytorch and run on the CPU, using other tricks to fit the model into available RAM.
As much as I love the progress in AI (also see Microsoft's recent Office Copilot) - I seriously think that government needs to step in to regulate the fair use of training data.
Though this is painted by my personal beliefs, in order to maximise innovation, I believe it's the role of government to implement regulations that support a competitive commercial environment. Without this, monopolies will form around hard to obtain resources, innovation will stagnate and consumers will be subject to exploitation.
Currently, data is mostly acquired without user consent and is accessible retroactively. Companies own that data, they can trade it and they can use it however they want to. You as a consumer have no say in this and it's virtually impossible to live a normal life without being the subject of analysis.
While it's incredible that companies can produce undeniably valuable products like Copilot - ultimately - they will profit from these products. The irony is they built them from data sourced from you, likely from something you paid for (MS Word, etc).
The key ingredient in these products is training data. If you wanted to compete with them, no matter how capable you are as an engineer, you could never make Copilot without the same scale of data Microsoft has gathered.
I don't know what kind of regulation would even out the playing field, but I wouldn't mind being compensated for my role in creating these highly profitable products.
TL;DR: execution of pytorch models on apple's neural engine and standard data-oriented optimisations (changing matrix layout, chunking to optimise temporal cache locality, and minimising redundant memory copies)
Not sure why Apple is throwing this code over the fence, (or catapulting it over the balustrade -- to continue the castle metaphor used here) and not also selling server form factor Apple silicon based devices for data centers. They left out an "issues" pane on the github repo, so it is intended to be quite the unidirectional act. I am not sure that Apple will remain proportionally huge with respect to other big tech while they squander their development on boutique mass market products, while ignoring the vast growth they could achieve if they expanded into first class cloud computing markets.
Its FP16/Int8 inference only (cause you can only access it via apple frameworks that dosent support training).
Also its only used if your data is small enough (4mb cache) it wont be useful for big transformers/ big images processing in a while.
As a newbie to this space, I see this mentioning PyTorch. I was looking at Whisper earlier today and, somewhat impressively, was reminded that my M1 Pro has a CPU fan. Is it realistic to think it would be a modest amount of work to install this in my local venv and use the NPU instead?
This is great. I cannot wait to try it on my laptop as I like to do dev locally. But I don't understand the development part - besides on device, how would you deploy this on a server let's say given Apple serves is not something cloud providers provide.
openai is doing great work and is serious competition, but I think many underestimate big tech. once they're properly motivated they'll catch up quick. I think we can agree that openai is a sufficient motivator.