Transformer architecture optimized for Apple Silicon

endisneigh · on March 23, 2023

i'd say within 5 years apple will have optimized apple silicon and their tech, along with language model improvements, such that you will be able to get gpt-4 level performance in the iPhone 19 with inference happening entirely locally.

openai is doing great work and is serious competition, but I think many underestimate big tech. once they're properly motivated they'll catch up quick. I think we can agree that openai is a sufficient motivator.

davnicwil · on March 23, 2023

I really think you have hit the nail on the head here. Apple has a ridiculous, almost unfathomably deep moat for training and running personalised, customised LLMs and other AI models on the 'edge' with these Apple Silicon chips in all their devices.

We must be talking orders of magnitude differences in operational cost, not to mention completely unique features like privacy. The very definition of disruption, waiting in the wings.

Not only that, but this model of 'wait in the wings and pounce when the tech is really there and absolutely nail it' is Apple's forté, aligned completely with the cultural and strategic DNA of the company.

arthurcolle · on March 24, 2023

> Apple has a ridiculous, almost unfathomably deep moat for training and running personalised, customised LLMs and other AI models on the 'edge' with these Apple Silicon chips in all their devices.

Do they? I can completely fathom, given my own anecdotal experiences, how garbage Siri quality is today. I haven't built anything against Siri APIs, but I've used Siri and various integrations and every single time I give her a shot, she disappoints me. It can do cookie-cutter, super well-traveled code paths that were engineered together for demos, basically just one off tricks without cohesion, but the whole infrastructure around Siri is not open and has not been improved in any way in many OS versions in any way that I've discerned... But I am not an expert with these systems or APIs.

The Neural Engine architecture especially considering M1/M2 hardware leaps (which are absolutely mind-bogglingly impressive) are both really superb technologies from my layman view as a mere software engineer, with improved battery life as well as performance, a real improvement over Intel x86 architectures - I just don't see Apple as a serious AI player right now, despite head start in AI w/ Siri acq, despite these hardware leaps that may make it easier to do cool stuff in the future like this post. The OCR stuff in Photos is cool and useful and I use it every day, usually to look up my Known Traveler Number.

Ramblings over, just wanted to ask you to opine as to Apple LLM related information if you have any additional context!

parker_mountain · on March 24, 2023

> how garbage Siri quality is today

a few notes on this

- siri is incredibly underinvested. They have another team that's building some sort of search and natural language processing engine, that has slowly sapped away some key headcount from the siri team.

- apple doesn't get the full advantage of tons of user data from the wild. this is both a bug and a feature

- the siri api model is clearly generations old, and i suspect that apple has been marshaling its resources into a big leap forward - alongside their hand tracking, ar, and hardware

- apple has shipped everything required for you to point at a light /in your house/ to turn it on or off (and optionally flick). this includes software - the individual components are built and ready - the only thing missing is gluing it together

There's something happening, and I think that the rumored glasses are the hardware totem.

afavour · on March 24, 2023

“Yes Siri is garbage but it doesn’t matter because Apple are probably working on a top secret new thing that will blow everyone else out of the water”

You might be right but there’s no real evidence for it yet.

mikepurvis · on March 24, 2023

Lots of things with Apple come out of nowhere. In this case there's actually quite a bit more evidence than usual because of all the fielded, underutilized hardware that's ready to go— when it launches, it'll likely be available on millions of devices, rather than as something you have to get the latest hardware to have, as is the case when the camera or screen undergoes a drastic improvement, or there's a new feature like wireless charging or NFC payments.

parker_mountain · on March 24, 2023

>because Apple are probably working on a top secret new thing that will blow everyone else out of the water”

that's a mischaracterization of what i said. what i said was that they have been clearly hiring key positions and cannibalizing the siri team for something new. that something new will likely be a major release. i also believe that the ar headset is the unifying product under which they're rallying.

janekm · on March 24, 2023

You may well be right, but if so that's a huge problem for Apple as by all accounts the AR device is still many years from being ready for release and meanwhile they are in a very precarious situation with Siri...

The first proper integration of a Whisper & a GPT-class LM will be a big step change, a Siri-like AI that you can actually talk to somewhat sensibly and expect it to "understand" more than pre-set phrases... Google could be in a position to release something for Android, and their are Android SoCs with neural accelerators as well (sure nothing as impressive as Apple's chips but Samsung have demonstrated a cut-down SD model on their SoC).

parker_mountain · on March 26, 2023

I don't think the new siri is intrinsicly tied to the device. I think that as part of remaking siri, they're using the new platform as a rallying cry and story as to why it's important. (it's hard to justify features such as making siri spatially aware otherwise)

LoganDark · on March 24, 2023

Hasn't Siri remained relatively unchanged for over a decade? I think the most recent innovation was just bringing it to Intel Macs.

PopePompus · on March 24, 2023

Yes, it's been stagnant for so very long that I, and I suspect most users, have learned the very few things that it can do reliably, and no longer even try to do something more sophisticated. If it became more capable, it might take a long while for most users to even notice, because we have stopped hoping it will pleasantly surprise us.

sroussey · on March 24, 2023

Most people didn’t notice, but dictation now runs in a hybrid mode — both on device and via server. So you get some (maybe poor) results at first (only results if offline but better than nothing) that get rewritten after a couple seconds from server.

For Siri itself, I think it runs locally now.

bigDinosaur · on March 24, 2023

It has had features removed.

post-it · on March 24, 2023

It's probably not glasses. Nobody likes wearing glasses. Every attempt at making tech glasses has failed. It'll probably be the Apple Watch.

moonchrome · on March 24, 2023

I'm not a VR/AR fan but if Apple can come out with super high quality AR headset that can augment/replace monitors I would buy one instantly. I might be earlier than mainstream adopter but if the tech ever gets there this will be a natural shift.

I won't buy Meta products but I keep waiting for someone to do VR workspace right - where I can truly work and travel from anywhere with a decent chair and a desk for keyboard (no need to lug around monitors, no need for huge table).

niklasrde · on March 24, 2023

I think that's it. I had half an hour to kill in a cafe yesterday, didn't bring my kindle, and I thought "if there were compact AR or even VR glasses that I could whip out to emulate a full-size working environment, that would be super handy right now".

Having said that, for me to spend money on it it would (a) also be able to realistically replace my set up at home, which currently consists of 2 HiDPI screens and (b) not be that much more expensive than what that cost me.

FredPret · on March 24, 2023

Everybody would love wearing Apple Glasses. Would probably look really cool.

I remember seeing AirPods in somebody's ears for the first time. "What a dork" I thought back then. Now they're cool and everywhere.

wwalexander · on March 24, 2023

I would not love wearing them. I waste a significant amount of money on contact lenses each year for the privilege of not having to wear anything on my face.

highwaylights · on March 24, 2023

> Now they're cool and everywhere.

One of these things is true.

yurishimo · on March 24, 2023

There is a difference between the subjective and objective "cool." They've sold something like 150 million pairs of Airpods? I think that qualifies as objectively "cool."

FredPret · on March 24, 2023

Another heuristic is: are the majority of attractive and young people using/buying/doing it?

AirPods are cool by that measure as well.

smugma · on March 25, 2023

If lots of people are doing or buying something, that’s a signal that it isn’t cool.

ngai_aku · on March 24, 2023

> apple has shipped everything required for you to point at a light /in your house/ to turn it on or off (and optionally flick). this includes software - the individual components are built and ready - the only thing missing is gluing it together

Could you expand on this bit? I’m pretty deep in the Apple ecosystem, but I’m not sure what you’re referencing here

parker_mountain · on March 24, 2023

Homepods have U1 chips. They can position airtags within your house. You can put an airtag under, or adjacent to, a lamp and link it to a smart switch. Your apple watch has gesture detection (still in beta, tbf), as well as U1 and real time positioning capabilities.

Everything required for apple to know not only where you are, but which way your hand is pointing, as well as where "smart devices" are in your house is already being sold and rolled out en masse to the majority of people in the ecosystem.

ohgodplsno · on March 24, 2023

$25 for an Airtag, plus say $20 for a smart switch. Plus $200 for a HomePod.

To turn a single lamp on/off by pointing at it, something that nobody ever wants to do (alright, once for the cool factor). If someone wanted to overpay for useless features, they can already go for a Philips Hue.

I can turn off my lamps from anywhere in the world using a $10 Tuta ZigBee bridge and a $8 LIDL light.

diffeomorphism · on March 26, 2023

Meanwhile you could use an iphone as a webcam for years by just buying an app,you could use an ipad as a cintiq by just buying an app, using your phone with your computer with kdeconnect also has been around for years. Pretty much nobody cared. Then apple came out with their branded version and suddenly you have people praising it as the best thing since sliced bread and great "innovation".

So my prediction would be that apple releases a ZigBee bridge and light for $50 in about 3 years and calls it a fancy name. People will tell you how absolutely essential and innovative it is and how you could never get that without an iDevice.

tlrobinson · on March 24, 2023

You sir, are not Apple’s target market.

mgdante · on March 24, 2023

Seems like Apple's target market is very narrow given homepod's weak sales

parker_mountain · on March 24, 2023

I think that this is one of those things that sounds like you'd never do it, but...

I've put zigbee switches into every wall switch in my house and now when I stay in hotels, I forget to turn off the lights before getting into bed.

>something that nobody ever wants to do

I think that this gesture, if it works well, is something that will be a killer app for smarthomes. The other point of friction, which is interop, has been more or less solved by thread/matter. Homepods are $100, btw. Not to mention what happens if apple integrates u1 into a smart switch/bulb, which is something that the thread protocol allows for.

revscat · on March 24, 2023

Not OP, but I suspect this is a simple combination of AR and precise device location awareness. The former comes in the form of ARKit. I’m not aware of a framework for locating devices, but it wouldn’t surprise me if there were one.

xvector · on March 24, 2023

U1 chip triangulation solves the second part. Would require at least 3 U1-enabled devices in your home.

Alternatively whole-home mapping via AR also solves this. No triangulation needed.

nicky0 · on March 24, 2023

As an mere user this is what I had assumed. Siri has been stagnant and buggy for years; it makes sense they are working on a big replacement instead of putting resource into incremntal improvements on what seems to be a weak foundation.

pantulis · on March 24, 2023

> apple doesn't get the full advantage of tons of user data from the wild.

Do you think OpenAI did? Perhaps Apple doesn't have the vast amount of data that Google has, but if OpenAI managed to have like 200 wikipedia-sized corpuses of different textual data for their English GPT models, that's certainly not out of reach of Apple.

cal85 · on March 24, 2023

> apple has shipped everything required for you to point at a light /in your house/ to turn it on or off

Point at it? You mean with a finger? What do you mean they've shipped everything required?

gumby · on March 24, 2023

> > Apple has a ridiculous, almost unfathomably deep moat for training and running [LLMs]...

> Do they? I can completely fathom, given my own anecdotal experiences, how garbage Siri quality is today.

Siri is a dead end. When Jobs bought Siri (what, 10 years ago?) he explicitly junked almost all the AI back end, mainly buying the speech recognition engine. I didn't understand why and still don't (but strangely he didn't ask me :-).

John Giannandrea has run Apple's AI effort for the past five or six years. He is the reason Google has a big AI effort (he consolidated a bunch of AI projects and bought Deep Mind, etc) before he decamped for Apple. For all I know the Siri team isn't even part of his remit.

You can never look into Apple (even if you work there) so one can only speculate based on what visible signs appear. But Siri isn't one of them.

sidibe · on March 24, 2023

> He is the reason Google has a big AI effort (he consolidated a bunch of AI projects and bought Deep Mind, etc) before he decamped for Apple

I'm sure he's great but this made me laugh

gumby · on March 24, 2023

I aim to please!

(And yes he’s a great guy)

duped · on March 24, 2023

Jobs has been dead for over ten years

gumby · on March 24, 2023

And it feels like nobody has worked on it since then

abudabi123 · on March 24, 2023

The cinema industry know how to squeeze a billion dollar a year out of Apple. Do the Apple or Sony ”social engineers“ have total view? When Amazon was sky rocketing and Blue Origin had not reached orbit a niche producer from the cinema industry had Jeff B paying a lot although he might have backed out of that deal like giving up on the Android phone with a lot of cameras.

dev_tty01 · on March 24, 2023

>When Amazon was sky rocketing and Blue Origin had not reached orbit...

Blue Origin still has not reached orbit.

feifan · on March 24, 2023

The part that Siri is bad at will be commoditized — someone will open-source a GPT-4-level language model. And Apple's moat will be being able to run that on-device with all the attendant benefits (privacy, zero marginal cost to the company, availability in more scenarios, etc)

jocaal · on March 24, 2023

You realize that these language models are like 100's of GBs in size and consumes 10's of GB's of memory. Last time I checked, apple still ships their products with less than the market average in both of these specs. If you want a local running LLM on an iphone, get ready to sell a kidney.

l33tman · on March 24, 2023

You can today run an LLM vastly better than Siri on a few GB of RAM using Llama 7B at 4-bit quantization and alpaca.cpp. This is moving so fast, every day there is something new coming. There won't be any moat in LLMs soon or even in dedicated HW as it turns out you don't need that much for "basic intelligence".

Note I'm not suggesting you can pack the full knowledgebase of humanity into those 2GB of RAM, but the key feature of an edge AI is simply to understand instructions, something Siri and Ok Google struggle with at best..

ithkuil · on March 24, 2023

(assuming we're not talking about the near future)

I think this can be a scenario of converging incentives: on one side large models will incentivized hardware manufacturers to increase the memory available on the devices, while on the other sides model developers will be incentivized to trim the fat on the models and devise compression mechanisms that don't compromise quality too much.

It's not unthinkable to imagine a hand held device able to run full inference locally a few device generations in the future.

arthurcolle · on March 24, 2023

yeah, makes sense. At least these integrations at a lower level are happening now. I just can't help but feel disappointed that we didn't see earlier cohesion at least a few years ago, maybe around the early or pre-M1 era when there were all those A7, A8 whatever SoCs. But in retrospect, that is basically 10 minutes ago in the history of the universe so yeah, I'm just overly excitable!

BonoboIO · on March 24, 2023

Same experience.

Alexa is DIMENSIONS better than Siri. Siri can create a timer ... okay even an alarm. That’s it. It is comically bad. Their text to speech is excellent, but the rest is unbelievable bad.

If Apple has some kind of silver bullet, it’s time to put it out or be left behind.

bradDonniger · on March 24, 2023

I don’t have all the details because NDAs but I ask questions and get nods and grins from colleagues and friends at chip companies; Apple (and nVidia and Intels) silver bullet to cloud hosted, software based AI is AI chips.

We’re circling back around to local compute being the default as hardware performance of next gen phones and tablets reaches a “good enough” point for most users.

There will be scientific problems that will require modern server clusters but most consumer facing AI needs will be done on hardware within a decade.

I’m not saying AGI in a decade, I’m saying cutting edge logic embedded in software now will be the basis for logic in chips in the years to come.

Tiktaalik · on March 24, 2023

Amazon has publicly walked back from Alexa because turns out people don't use it for much beyond creating a timer and it doesn't make money.

I think it's quite likely that Apple realized the same quite a bit earlier and eased up on Siri versus focusing on other things.

vbezhenar · on March 24, 2023

Honestly I observe zero progress in the last 10 years. My first iPhone was 4S. My current iPhone is 8. My use-case is absolutely the same. Calls, SMS, Whatsapp, navigation, books, browser. The same with macs. I boot, open IDE, Terminal and that's about it. Yes, hardware's getting faster (which is compensated by software getting slower). But nothing changed dramatically.

I think that the only thing that changed is adoption of fingerprint reader. Both in iPhone 8 and in mac.

AI might be that thing that could tremendously change my usage pattern. But it should be as smart as human assistant. GPT4-level intelligence might have the necessary power for that.

fbdab103 · on March 24, 2023

Can we take a minute to comment on how bad the text correction is on the iPhone?

theptip · on March 24, 2023

I don’t think Apple necessarily needs to lead the pack on LLM research. They just need to take the SOTA FOSS model and bake it into silicon, using their chip design expertise.

If you do that, you can do inference on mobile devices, which is a huge privacy win; it plus into their general privacy positioning in a big way. If you open up that SDK to developers it would be the iOS app gold rush all over again.

Note, Google is also moving in the direction of on-device inference with Coral, but they obviously will want to transmit the personalized model weights back to the mothership.

agentcoops · on March 24, 2023

Honestly, it speaks mountains to Apple as an organisation that they did not invest all of their resources into Siri, especially when you consider how Amazon invested in Alexa etc. Everyone has long understood that the "personal assistant" represents really the pinnacle of not just what AI but the personal computer should be able to offer -- the mistake this past decade was thinking that had to do with then current ML paradigms or with these completely useless voice agents. That Apple could realise it was a dead-end and not sink much more investment/engineering resources into the tool any further than what anyone actually uses these tools for (changing song, lowering volume, etc) is outstanding.

Ultimately, I'm very reassured that it's Apple and Microsoft leading the way for personal and business AI assistants respectively. As Stratechery has emphasised, the latter not only already has all the data from most firms outside Silicon Valley, but more importantly has always adhered to and designed their software according to a computer-as-a-tool philosophy versus Google's "ML/AI will do everything for you behind the scenes"; the former has not only made a show of data security but invested massively in the hardware architectures necessary to making on-device AI a possibility. Without at all being a "fanboy", Apple is quite literally the only company I would trust my personal data to for use in a GPT-based assistant.

mark212 · on March 24, 2023

Amazon's investment in Alexa is a perfect example of why Apple didn't (and shouldn't have) invested all of their resources into Siri. The Alexa team is getting seriously gutted in these rounds of downsizing [1], or as CNBC puts it "the team behind the technology was a prime target of the largest layoffs in the company’s history."

What they have been invested heavily in is the Apple Neural Engine ("ANE"), special silicon right on the SoC to handle ML / AI code. Optimize on a server, then run the model on your iPhone or probably soon, your Apple Watch.

WWDC this year is going to be very, very important.

[1] https://www.cnbc.com/2023/01/06/amazon-fully-committed-to-al...

mannyv · on March 24, 2023

Siri is garbage because it's trying to be a real AI, as it was understood to be at the time. Alexa works because it's not an AI but it seems as if it's an AI.

richardw · on March 24, 2023

If you’re the exec in charge of this at Apple, what are you doing right now. It’s not watching OpenAI become what Siri could be unless you have something to do about it. Your motivation levels are extremely high.

highwaylights · on March 23, 2023

Maybe, but Apple doesn’t have a search to rival Google (or even an assistant, given the state of Siri).

Focusing on privacy and on-device learning is great, but when the strength of these models is in consuming all the data they can hoover up your motive is at odds with your philosophy.

scarface74 · on March 24, 2023

Apple gets $18B a year from Google to be the default search engine on Apple devices. Why would Apple build a search engine that would probably be inferior and not be able to monetize it?

nojito · on March 24, 2023

They already have a search engine?

Just swipe down on any iPhone.

scarface74 · on March 24, 2023

Yes and at one point someone wrote a web front end for it. It’s laughably bad.

behnamoh · on March 24, 2023

I had to disable several features of iOS search due to ridiculously long delay and lag.

threeseed · on March 24, 2023

They do have a search engine though with Siri Suggestions. Which has a user base of ~1.5b+ people.

We only see the tip of the iceberg so who knows what else it is capable of.

lotsofpulp · on March 24, 2023

“Hey siri, what is today’s date?”

“Sorry, I’m having trouble connecting to the network”

epistasis · on March 24, 2023

Me: Hey Siri, call <my city> Toyota.

Siri: I'm sorry, that contact is not in your list

Me: Siri, what is the number for <my city> Toyota.

Siri: <my city> Toyota's phone number is 123-456-7890 [said too fast to remember or write down in one go]

Me: Siri, call <my city> Toyota

Siri: I'm sorry, you do not have that contact number.

Me: &$@@&/&&/&!!!

fingerlocks · on March 24, 2023

Just tried this exact interaction. Siri gave me a list of nearest Toyotas and made the call. Seems to be working just fine.

Starting to think all these Siri complaints are either made up or really outdated.

lotsofpulp · on March 24, 2023

I only use Siri with the original Homepods. It has always been terrible. You cannot even tell it to start playing movies on the TV that you have already purchased from Apple itself in the TV app. It will start to play some random song or something.

I should be able to say hey siri, start playing <name of media> on <name of Apple TV>, and it should be able to start the TV and start playing it.

highwaylights · on March 24, 2023

A friend has one that constantly plays U2 out of nowhere. He doesn't even listen to U2, and maintains that he's never requested it.

This isn't in response to a prompt either. He'll unpause the connected TV and it'll just start blasting U2.

UncleEntity · on March 24, 2023

Yes, the U2 Songs of Innocence album.

In case someone doesn’t know they gave that album to all iTunes users as part of a promotional thing when it came out.

On my iPhone if I connect the Bluetooth headset and accidentally push the call/cancel button out comes U2…

epistasis · on March 24, 2023

If you have the magic phrase to make it work, I'd like to know. I tried for about five minutes while driving yesterday, to no avail.

For me, Siri is like an 80s text adventure game, except I was better at those.

Edit: just now I was able to call by interacting a second time with Siri, and using the physical button. Using the button never would have occurred to me while driving.

nicky0 · on March 24, 2023

You must be the one user that Siri works perfectly for.

RulerOf · on March 24, 2023

I spent a decade lowering my expectations, and they're still not low enough.

gcanyon · on March 24, 2023

The perfectly obvious use cases Siri overlooks are maddeningly numerous. My go to example was "text this photo to <contact name>" For years I checked each new iOS release to see if that was enabled, and for close to ten years, no. I'm not sure whether it was the most recent, or the one before (I gave up somewhere along the way) but now you can (finally) do it.

Tagbert · on March 24, 2023

"HeySiri, send this photo to name in messages." "Ready to send" "Send it" "It's sent"

done

Perhaps the verb "text" is unclear to Siri?

gcanyon · on March 24, 2023

Like I said: it does it now. It didn't from inception until something like 2021.

gnicholas · on March 24, 2023

I work around this by asking it to find Toyota in <my city>. Then if it finds the dealership, it asks if I want to navigate there, or call them.

I find the worst errors to be when I ask it for information, or to send a text, and it instead places a phone call to someone. It will even call people that I have never called on my phone (a fact it should know), without asking first.

dmd · on March 24, 2023

“hey siri, what’s the weather today?”

“Now playing Eminem Love the way you lie featuring Rihanna”

lotsofpulp · on March 24, 2023

Hey siri, turn the TV off.

Which TV? Bedroom or Living Room or Everywhere?

(Only 1 of the 2 TVs is ever on)

I won’t spam this thread anymore, but I would be pleasantly surprised if it improved.

illiarian · on March 24, 2023

And AppleTV itself is hot garbage for this kind of stuff even though it's a veritable supercomputer that should be the hub for Siri and home automation.

- Hey Siri, <do something that involves AppleTV in any capacity>.

- I'm sorry, one of your devices is off

scarface74 · on March 24, 2023

And that has nothing to do with an LLM. That has to do with speech to text algorithms. It also doesn’t take advanced AI to do a simple intents based answers.

nicky0 · on March 24, 2023

Yes we are aware that Siri doesn't use LLM yet. That is what this thread is about.

fsckboy · on March 24, 2023

Siri never tells me it has trouble connecting to the network, it says "something went wrong"

highwaylights · on March 24, 2023

two minutes later

"Hey Siri, what is today's date?"

"That's sweet, but I think of you as a friend"

"What? The date, Siri, what is the date?"

"You're so sweet"

"Sigh"

astrange · on March 24, 2023

If you're asking for an LLM this will get worse because the first L means "large", as in too large to fit in a phone without a network connection.

015a · on March 24, 2023

They have every query their users enter into Spotlight. They have all the results it returns (who cares if they originated from Google, ok maybe they do, maybe not using it for anything is part of their deal with Google). They have your contacts, calendars, messages, email, music, workout history, tasks, real time location, which Siri already uses to e.g. recommend a destination when you get in your car to a degree of accuracy that straight spooks me sometimes.

Apple has access to SUBSTANTIALLY more fine-grained personalized data than Google does. Full stop. That's a weird thing to assert in the tech crowd, but think about it deeply and you'll realize its true.

Google has islands of services that they've done an extremely good job of building bridges between. They have some islands that Apple has nothing like (YouTube is the biggest one) (Search is a huge island, but only indefensible if the interfaces customers use to access it don't leak the queries and results; no one opens a browser and types "www.google.com" on mobile and even if they did, Apple controls the browser renderer, they could lift everything if they wanted to, they won't, I'm simply illustrating how much Apple should fuckin scare Google). But the waters between those islands are patrolled by Samsung, Xiaomi, Oppo, and Vivo, and there's sharks in there as well (uh, the metaphor is falling apart, the sharks are "Google's tumultuous history with privacy and user blowback if they overstep").

Apple is a continent. Many of their cities are ports that connect to some of Google's islands, but Google's product still has to be checked by customs.

You're right that even given that power, hoovering it up is at-odds with Apple's philosophy. Or, is it? I think "hoovering it unencrypted to the cloud" definitely is; but that's the point OP was making: if we're extremely close, as a species, to solving "make AI work", one of the challenges for the next five years is inevitably going to be "make it more personal". Its awesome that I can ask ChatGPT to write a date comparison function. It'll also be awesome if I could ask Siri "when did sarah and I talk about getting a cat" or something.

That requires personalized data to set the context. If Apple can swing at the fences and say "everything is on-device, nothing leaves, we can't see it, and now you can ask Siri that, and by the way there's new functionality built-in to iOS that apps can leverage to integrate with Siri's new LLM capabilities just like ChatGPT plugins"; that's an extremely compelling product. Extremely. And I know they would ingest that data, that they would do that, because they already do! Go ask Siri to call Sarah, or if you have any meetings tomorrow (assuming you're using Apple Calendar), and it will respond. I don't know where you're getting this take that they don't "hoover data"; they ingest everything from all their first party apps into Siri's `DATA MATRIX`. Its just, you know, 2010s era querying and data crunching.

Don't get me wrong, Google's gonna do a lot of this too. But Google is playing from the position of "we have a model, we have the data, we just need to pay hundreds of millions of dollars a year maintaining all these servers". Apple is playing from the position "our customers are paying us for the silicon to run this, we have the data, we just have to figure out the model". If its not obvious at this point: the models/algorithms/etc are not a moat. They're going to be commoditized with time. Publicly accessible data isn't a moat. Personal data is a moat, because people care about privacy. And cost effective training and inference silicon is a moat, because its Physical, and like literally One Company on the planet makes it, and they're in a big time situationship with Tim Cook.

hobs · on March 24, 2023

Apple does not have the queries that their users enter into spotlight, They dont have all the results it returns.

You dont spawn a network request on every local search on an apple device.

015a · on March 25, 2023

Yes you do. It trivially does. Search for "Burger King" and see how many results that generates which couldn't possibly be served locally (undownloaded Apps from the App Store, Maps, search recommendations, etc).

They may not be storing these queries or using them for further analytics, training, etc. But they could.

saagarjha · on March 24, 2023

Typing into Spotlight sends a request to Apple?

smoldesu · on March 24, 2023

> And cost effective training and inference silicon is a moat, because its Physical, and like literally One Company on the planet makes it,

I got Nvidia on my paper. Did I do the math wrong?

015a · on March 25, 2023

TSMC manufactures the silicon for nearly all of the current generation high end silicon. The only exceptions I've been able to find are, naturally, Intel's CPU lineup (which has nothing to do with AI) (and, to be clear, Intel Arc is also manufactured by TSMC), and the Snapdragon 8cxg3, which is manufactured by Samsung; and the "advanced-ness" of the 8cxg3 is pretty miserable compared to A16, M2, RTX 4000, RDNA3, or Tensor Processing Units.

Apple has purchased the entirety of TSMC's 3nm production for the next 1 to 2 years. They can do that, and can continue to do that, because they have a ton of money. They have a ton of money for buying chips because there isn't some crazy business middle-logic justifying the cost of these chips' performance; they buy chips, they sell chips to consumers. In comparison, literally zero other customers of TSMC derive the majority of their hardware revenue from selling to consumers. Companies like Nvidia make some money this way, but most of their money goes to data center sales, which has their own business justification for buying them which fluctuates (ChatGPT subscriptions? training models? is the past model good enough? etc).

jocaal · on March 24, 2023

Google has TPUs, Amazon has tranium and inferentia

oidar · on March 24, 2023

GPT-4 seems pretty "moatish".

fnordpiglet · on March 24, 2023

I think this is wrong because you can always build a search engine - it’s not hard - and you can supplement the learned models with cryptographically secured private data to build a private personalized model. In fact this seems in all ways superior to any alternative, and strikes at the heart of the Google spy ware ad spam business model.

And seriously, did you really try to assert the Google whatever it is has outperformed Siri? I’d love to meet someone that uses it so I can verify this statement. But with the advent of Chatgpt Alexa, Siri, and Google whatever all look like freshman projects. The next generation of assistants have just begun, and I assure you, the state of the past has nothing to do with what comes next. OpenAI hit a giant reset button across an awful lot of stuff.

coolspot · on March 24, 2023

> And seriously, did you really try to assert the Google whatever it is has outperformed Siri?

Anything outperforms Siri. Google Assistant is head and shoulders above Alexa and Siri.

schrodinger · on March 24, 2023

I have both and find that 90% of the time my commands are “remind me ___”, “wake me up at ___”, “what’s the weather today”, and “play ___ on Spotify”. For these, I can’t tell the difference between google and Siri in terms of quality (but Alexa was worse).

bigDinosaur · on March 24, 2023

I was just setting reminders today and had Siri repeatedly fail to do what I want. The only use case I have for it is setting timers, which is does very well. I'd use it for much more if it was as good as ChatGPT.

fnordpiglet · on March 24, 2023

This is entirely my point. They’re all crap at a fundamental level, none of them outperform any other. But the next generation, which will assuredly be backed by a LLM, will be shockingly powerful.

abudabi123 · on March 24, 2023

The MKBHD personality is onto Monte Carlo testing in his device reviews. He has an intuition for good UIX. Google and Apple should pay people like MKBHD and his production crew a billion dollars every year to film achievable, desirable text, voice, camera A.I. assisted use-cases.

fnordpiglet · on March 24, 2023

They don’t work like that though. Instead they spend $3b / year on hiring fungible cog SDE 2 job code 67483’s and look at gantt chart roadmaps produced by fungible cog PM 3 job code 74842’s for fungible cog SDM 7 job code 84747’s with the nominal assistance of centralized (I.e., marginalized) fungible cog UX 5 job code 35563’s then wonder why it’s perennially late and complete garbage. Except no one wonders that because fungible cogs don’t have wonder in their job responsibly matrix.

ThorsBane · on March 24, 2023

Yeah, Apple is slower to adopt some things but I noticed when they hit it...they knock the ball not just out of the park...they elevate the entire game.

I dominated a 13" mbp m1 for like two solid years on hardcore startup software engineering. The base model. It was almost entirely perfect until the last few months when I upgraded to the new 16".

These are alien devices. They're just not possible yet here I am using one.

jorvi · on March 23, 2023

Siri was supposed to become locally processed but sometimes can’t even set a timer because of a connection time-out. Or she’ll use her fancy ML speech recognition model to turn “set a timer for 3 minutes 10 seconds” into “search for trinity tensor”.

So much for an “unfathomable moat”.

belugacat · on March 23, 2023

Siri is so bad it needs to be scrapped and rethought from the ground up (it can’t even give me the time when internet is down at my rural house). No one at Apple who wants a career there would dare propose that. The switch to a LLM architecture could be the perfect transition point for this.

015a · on March 24, 2023

If Siri can't give you the time without the internet, I think you need to update iOS.

There are definitely two tiers of Siri queries. There are queries like "set the brightness to 10%" or "set a timer for 5 minutes" which absolutely and consistently work without internet, and have for several years, and if you're legitimately having a different experience then its possible a cosmic ray hit your iPhone (or, realistically, you're running into a strange and rare bug which is not indicative of the general experience and will be fixed in three to five years or maybe never). There are also queries like "create a new note" (from the Notes app) which should be able to work offline, but don't. And, naturally, there are queries like "when did Resident Evil 4 come out" which wouldn't reasonably work without internet (but, if you're curious, she does get it right).

In other words, Siri clearly does some local guessing as to whether she can answer a query without the internet, and some queries appear to be miscategorized into the second bucket. My leading theory on why this happens, which may be incorrect, but it seems like: if an app has any Siri functionality which requires the internet to answer, all of the queries which are responded to by that app have to require the internet. It doesn't matter where the processing ends up actually happening to respond to the query, it just shuts the query down. Its weird, but its consistent with the behavior I've seen.

The more important point: Siri's real and weird limitations don't seem to have much to do with limitations in local processing. They've said that the speech interpretation all happens on-device. They encrypt practically all of your data that does get shipped to their servers, so a query like "Open the note titled 'Hello World'" is probably also being processed on-device. But: Siri still requires the internet for that query. That doesn't seem like a significant limitation with their ML algorithms or silicon or anything meaningful; it seems like just a case of dumb coding, which is certainly something Apple is no stranger to.

king_magic · on March 24, 2023

> If Siri can't give you the time without the internet, I think you need to update iOS.

interesting - latest iOS in airplane mode - "hey siri what time is it?" - "you need to turn off airplane mode to do that"

but - "hey siri set a time for 5 minutes" - works just fine

smoldesu · on March 24, 2023

They must be using the esteemed time delivery API, https://www.whattimeisitrightnow.com/

getcrunk · on March 23, 2023

You see the only reason it does that is because your … holding your phone wrong

wetpaws · on March 24, 2023

Apple's exceptional hardware can only be rivaled by the absolute shitfest quality of their software. So no, I'm not buying it.

lunatuna · on March 24, 2023

Exactly. You can see Apple struggle to build anything of quality with their software stack. Good luck making anything from this. So much great HW sadly wasted.

josephjrobison · on March 24, 2023

In addition, Apple is notoriously slower at adopting the most cutting edge technologies but historically they tend to consider things very thoughtfully and rolls them out in beautiful launches (not always).

Examples would be the fact that Samsung almost always beats them to the punch on camera technology and other whiz-bang features, but Apple eventually adopts to much mass market consumer acclaim and groans from Android techies. Another example is they’re just now considering touch screen on laptops.

A counter example is the TouchBar - which was “innovative” but many didn’t like.

yyyk · on March 24, 2023

> Apple has a ridiculous, almost unfathomably deep moat for.. running personalised...

Apple has decent hw but no LLM software. Unfortunately, in LLM space software changes are the ones driving performance for now, since the space hasn't stabilized yet. Since their competitors control the software, they get to adapt it to their hardware. That is, Google and Microsoft are going to adapt GPT/Bard to Qualcomm ARM etc. while Apple is being ignored. Unless Apple gets in with their own LLM (quite possible), their hw advantage will end up not mattering one bit.

zitterbewegung · on March 24, 2023

They might not even require the need to train. There are many open implementations of models that don’t have an academic requirement. Right now you can use tensorflow.js which runs on the gpu using webgl of a phone. If you build the API out developers will use it.

waffletower · on March 25, 2023

iCloud has screamed ineptitude for decades. I don't see how Apple can possibly develop successful products that can't be held in the hand and fetishized by myopic designers. Much like a stereotypical grandparent raised during the Great Depression, Tim Cook still leads like a supply chain miser trying to save a company that has long prospered -- they tragically squander the unprecedented capital at their disposal. Change my mind, Apple please.

ibejoeb · on March 23, 2023

OpenAI's relationship with Microsoft pretty much makes it big tech. I think you have a point as long Apple keeps innovating its hardware, but it's not really a David vs Goliath situation.

brokencode · on March 23, 2023

Nvidia is also no David to Apple’s Goliath. They are both Goliaths who will most likely be battling to build the best hardware for LLMs in the near future.

highwaylights · on March 23, 2023

It’s an exciting time.

I think people are very prematurely counting AMD and even Intel out of this race considering the pace of change.

thallium205 · on March 24, 2023

Intel is getting way too comfy soaking in government subsidies. It’s definitely out on the innovation front.

dragonwriter · on March 24, 2023

Google is also in the machine learning hardware, game, on both the remote end and the device end.

gitfan86 · on March 24, 2023

NVDA has the potential to hit huge profit growth in the short term, but not long term

btotes · on March 24, 2023

What makes you say that?

gitfan86 · on March 24, 2023

There may be a short term panic to get position of CUDA GPUs which will allow NVDA to increase their prices a lot. But then other chip makers and software creator will build alternatives to NVDA

stn_za · on March 24, 2023

And be so far behind it's not even funny.

Look at AMD vs NVIDIA RTX implementations. Nah, NVDA is a solid buy for long term. They also almost acquired ARM (Which shows where their heads are at at the very least.)

Owning ARM, the thing that is inside ya know, everyone's phones, routers, etc

l33tman · on March 23, 2023

You can already run stuff like it on mobiles, for example alpaca.cpp with Llama models run fine on the CPU on any device that has a few gigs of RAM. Getting to even larger models using weight quantization, distilling etc it's not 5 years, it's 5 weeks or maybe 5 months at most..

Btw transformers are really simple and the optimisations making them run fast on CPUs have come a long way. I don't know the M1/M2 benchmarks for this, but many CPUs for edge devices have NN accelerators on the silicon that can run this, it's the pairing of the accelerator and gigs of RAM that is the key.

philip1209 · on March 24, 2023

I agree that you're going to see Apple start to push for on-device AI.

OpenAI promotes thin clients - run everything in the data center.

Apple has been a proponent of thick clients - run everything as possible locally.

So, it make sense that Apple will start to promote local LLM support. That's why they pushed Stable Diffusion support so quickly:

https://machinelearning.apple.com/research/stable-diffusion-...

abraxas · on March 24, 2023

Greed is a fabulous motivator but it can't surpass the laws of physics. This is probably why we don't have a cure for cancer or room temperature superconductors or jurassic parks.

It may be so that scaling down a language model like GPT 4 is not possible on hardware systems orders of magnitude smaller than the one used by OpenAI.

I'm not saying it's impossible but it's fallacious to just assume outright that it's inevitable because market forces.

For all we know it could turn out that the only way to get a gpt in a pocket format is through some form of analog chips that must be individually trained that Hinton aptly described as an era of mortal computing.

alwayslikethis · on March 24, 2023

There is nothing impossible about a Jurassic park in terms of laws of physics though. It's a matter of getting the right materials to recreate dinosaurs.

Same goes for cancer cures. I'm not sure if there are physical limitations preventing a room temp superconductor.

kergonath · on March 24, 2023

> I'm not sure if there are physical limitations preventing a room temp superconductor.

Not really. We just know that it’s very uncommon at best. The limit is material dependent and we already have superconductors with more than one order of magnitude difference in their critical temperature (e.g. ~4 K vs ~40 K; YBCO, which is widely studied, is at 90 K). It is not inconceivable that we could come up with some fancy material with a critical temperature three times as high again. There are several laboratories with good money working on it.

armchairhacker · on March 23, 2023

This would be awesome. I don't like having to rely on closed-source cloud services for LLMs.

But what incentive does Big Tech have? Especially considering that presumably they could monetize the cloud services easier. It's not like most consumers care either.

I expect smaller companies like Stable Diffusion and grant/govt-funded research to bring cheap, local inference. There's definitely a lot of demand. But is there more demand than cloud-based services, so that it's economically viable for the biggest companies (Apple)?

endisneigh · on March 23, 2023

i'd say apple is one of the few companies that could charge people for an "AI embedded" 128GB module, allow people to pay for a subscription in order to "access", yet have the inference happen locally.

one hypothetical is that you need an icloud subscription, a token is retrieved from apple, the token "unlocks" your AI module, on the phone and allows you to do the inference.

in this way apple could charge monthly for this and claim that the inference happens locally. sadly in this was it was similar to the whole csam debacle

apple is also one of the few companies that could realistically get manufacturers to massively produce a hypothetical "AI model chip" that has the model on device, at a quantity that would make it realistic to pay for in a hypothetical iPhone 19 Pro model.

brookst · on March 23, 2023

What other services does Apple charge a subscription fee for despite having no opex?

endisneigh · on March 23, 2023

apple fitness I imagine has a pretty small opex (sure they pay trainers, but that doesn't scale linearly with subscriptions. the videos themselves are limited in scope and could be cached pretty trivially. the actual fitness tracking happens on device), same with apple arcade which I highly doubt has any additional expensive over the app store in general.

I can see it - "Siri+", pay $5 a month for fine tuned model upgrades straight to your device and remote fallback. local inference available for Pro devices only.

smugma · on March 25, 2023

If you look at Peloton’s financial statements, you’ll see that your assumptions are off. For one thing, music costs are significant (Peloton had to settle a $200M law suit for not doing it right).

In addition to the trainers, you have producers, graphic designers, artist spotlights, celebrity interviews, etc.

Yes, some costs are fixed so with more subs you get higher profitability but that’s really different than charging for something that doesn’t have a variable cost.

brundolf · on March 23, 2023

You don't need a subscription to track and view your data (I've got an apple watch and no subscription). I think all the subscription gets you are the training courses (videos, but also live events and stuff)

endisneigh · on March 23, 2023

my point was more around how you need apple fitness + subscription to do relatively trivial things like show your health metrics on a TV (which you also need an apple TV for).

8f2ab37a-ed6c · on March 23, 2023

OpenAI IS big tech at this point given the massive financial and infrastructure investment from Microsoft. I think we're well past the small underdog narrative.

api · on March 23, 2023

Apple M series chips are already some of the best chips for running your own models locally, especially if you want a laptop or small form factor machine instead of a big gaming desktop with a big GPU.

Apple is really getting a niche here for machines to run models locally. That’s pretty powerful.

itg · on March 24, 2023

I would not say they are the best chips if you want to run locally. They are ok although a bit slow compared to desktop gpus for inference.

jazzyjackson · on March 24, 2023

What desktop GPUs would you recommend for the home user with access to 20+ GB of VRAM? I can get a Mac Studio with 128GB of shared RAM for about 5k.

dragonwriter · on March 24, 2023

> What desktop GPUs would you recommend for the home user with access to 20+ GB of VRAM?

I think the usual current model would be the 24GB Nvidia RTX 4090.

lostmsu · on March 24, 2023

What's the point of shared RAM exactly? 2x3090 ($2000) + 128GB RAM ($300) to cache weights would be much faster. Probably around 10-20x.

smoldesu · on March 23, 2023

> Apple is really getting a niche here for machines to run models locally.

I... don't agree. They have an acceleration API and a large install-base, but all of these models have run just fine on traditional hardware. GPT-Neo, Stable Diffusion and LLaMa can all be used and accelerated without Apple Silicon.

Powerful, maybe. But not really unique, just putting up table stakes.

aunty_helen · on March 24, 2023

Honestly, I've looked at shorting MS due to openai.

Their technology (LLMs), or the secret sauce, can easily be stolen just by the process of putting that tech out there.

Have a look at Alpaca, FB made it, someone leaked the weights and now there's a dataset of training it for only a few hundred dollars that can beat openai at its best.

Not everyone needs to employ a PhD for doing customer service, in the same way not everyone needs GPT5 for answering support queries.

Their business model is leaking away from them.

anukin · on March 24, 2023

TBH alpaca is nowhere near to chatgpt or even gpt models in its capability. I have been playing around with lama for a while now and even chain of prompts, the quality of answers are really bad. I had thought that Facebook would be having the ability and competence to produce a LLM which can outclass open ai

illegalmemory · on March 24, 2023

I also believe this is going to the case. The past decade apple has positioned itself as a privacy focused + good hardware company. The next is going to be offline AI + privacy + good hardware & I am already sold to the idea.

schrodinger · on March 24, 2023

Agreed. Apple is fantastic at waiting for the moment technology to be mature enough to be useful to everyday people and jumping on it. They have a fantastic CPU department that no one else can match, and certainly must have a skunkworks AI team. They’ll let everyone else make the mistakes, learn from them, and release a great product.

s1k3s · on March 23, 2023

Isn't GPT so complex that it requires hundreds of GB of ram to be used? How's it going to run on iphone?

garblegarble · on March 24, 2023

I'm similarly skeptical, but that said I'm running 30B parameter LLMs on my 32GB M1 Macbook Pro every day now. The trick is quantising them down to 4 (or even 3) bit, it's possible to massively reduce the memory requirements. Have a look at[1]

The devs working on llama.cpp have been discussing ways to further reduce the memory requirements by mmapping the large weights files (I thought LLMs mutated the weights as they run inference, but they clearly know more than me about the internals), bringing it within reach of phone memory.

So, iPhones are not as far off the computational capacity to run these models as you'd think. Memory (and to a greater extent, battery and cooling) are the limiting factors. iPads even less so, given they run M1 chips and have much larger batteries & much more RAM

https://arxiv.org/abs/2210.17323

qumpis · on March 24, 2023

Offtopic, but for what purpose are you running llms locally (especially everyday)? My understanding was that the prompting requires to make them work at all was too great.

garblegarble · on March 24, 2023

A little bit of research, a little bit of actual useful tasks - I'm interested in summarisation, which alpaca is decent at (even compared to existing summarisation-specific models I've tried)

My other motivation is making sure I understand what offline LLMs can do... while I use GPT-3 and 4 extensively, I don't want to send something over the wire if I don't have to (e.g. if I can summarise e-mails locally, I'd rather do that than send them to OpenAI).

It's also surprisingly good at defining things if I'm somewhere with no internet connectivity and want to look something up (although obviously that's not really what it's good at & hallucination risks abound)

johnthuss · on March 24, 2023

What prompt are you using for summarization? I’ve tried several variations without consistent results.

garblegarble · on March 24, 2023

On alpaca, I've found "Below is an instruction that describes a task. Write a response that appropriately completes the request. Summarise the following text: " or "Give me a 5 word summary of the following: " to work fairly well using the 30B weights.

It's certainly nowhere close to the quality of OpenAI summarisation, just better than what I previously had locally (e.g. in summarising a family history project with transcripts of old letters, gpt-3.5-turbo was able to accurately read between the lines summarising an original poem which I found amazing).

I half wonder if the change in spelling from US -> UK makes a difference...

I'd run a test on that but I've just broken my alpaca setup for longer prompts (switched to use mainline llama.cpp, which required a model conversion & some code changes, and it's no longer allocating enough memory)

bigfudge · on March 24, 2023

Necessary if you have sensitive datasets you can’t share with US company

_rs · on March 24, 2023

Off topic slightly, but are you running into limits with 32GB RAM that the 64GB model would meaningfully be adequate for? Do you wish you had one of the larger RAM models?

garblegarble · on March 24, 2023

I've been pretty happy with 32GB, but the 30B models do push near to the limits. I don't see a big difference between the quality of 65B (running on a 64GB x86 host) and 30B on M1 (although that may be the 4bit quantisation though, so take that with a grain of salt). I'm just glad that I have it on an M1... I have a 3080 in my PC, but when I got that I was thinking more of Stable Diffusion and YOLO tasks rather than LLMs, and it just doesn't have the VRAM for LLMs.

Alpaca seems like it could be significantly improved with better training (some of the old training data was truncated), so I think there's a decent amount of improvement to be had at the current model size.

In the future though... what would really be a meaningful change would be a larger context size - the 8k tokens of GPT-4 was a big improvement for my uses... I would guess a future local llm with larger context would exceed 32GB, but that's speculation beyond my expertise, I don't know how context size and network size scale.

If it was a PC I'd say go for 64GB, but hard to recommend that given how much Apple charge for RAM upgrades. On my next upgrade (2+ years time, hopefully) I'll likely opt for 64GB+ though

_rs · on March 24, 2023

Yeah, it is expensive. My other strong consideration is battery life, since DRAM is always running; going from 32 to 64 would be a hit to battery life regardless of workload, but hard to say exactly how big of a hit.

I'm curious, which configuration of the M1 MBP do you have?

garblegarble · on March 24, 2023

I went for the 16" with M1 Max w/32 GPU cores and 1TB SSD (500GB free, I offload most large files my NAS/iCloud). On the added power usage, my understanding is that's less of a concern due to using LPDDR5?

The only drawback I've found with the M1 Max model is the added weight from the bigger heatsink just makes it a hair heavier than I'd like when picking it up at the front with one hand when open... and that in the winter time the case is cold no matter what you're running, I used to love that my Intel MBP acted as a mini leg warmer :-)

endisneigh · on March 23, 2023

yes, there are billions of parameters necessary. but large language models only came out about 5 years ago. I'm confident 5 years from now the parameters necessary to get gpt-4 performance will be decreased orders of magnitude.

at the very least, even if that's not the case, inference will be drastically less gpu heavy by then I suspect.

chatmasta · on March 23, 2023

There will also be hardware improvements (as always) and ASIC chips specifically designed for running this kind of model. For example, see this "Optical Transformers" paper [0] and its HN discussion [1] from last month.

[0] https://arxiv.org/abs/2302.10360

[1] https://news.ycombinator.com/item?id=34905210

tlrobinson · on March 24, 2023

I could also imagine a sort of two-tier approach, where the on-device model can handle the majority of queries, but recognize when it should pass the query on to a larger model running in the cloud.

s1k3s · on March 23, 2023

Wait, so there's a way to make a model as smart as GPT but with less parameters? Isn't that why it's so good?

month13 · on March 23, 2023

This is an older paper, but DeepMind alleges in their Chinchilla paper that far better performance can be extracted with fewer parameters; quote

"We find that current large language models are significantly under-trained, a consequence of the recent focus on scaling language models whilst keeping the amount of training data constant."

It's difficult to evaluate a LLM's performance as it's all qualitative, but Meta's LLaMA has been doing quite well, at even 13B parameters.

astrange · on March 24, 2023

Chinchilla is aimed at finding a cost-performance tradeoff as well, not the optimal amount of training. If cost is no barrier because it'll be used forever, then probably there's no amount of training that's good enough.

monocasa · on March 23, 2023

The rumor I've heard is that GPT4 didn't meaningfully increase the parameter count versus GPT3.5, but instead focused on training and structural improvements.

qumpis · on March 24, 2023

Well the inference time of gpt4 seems to be far greater than gpt3, so it could hint a difference in parameters count.

LesZedCB · on March 24, 2023

if you watch their announcement Livestream video it looked just as fast as normal ChatGPT.

I think what we have access to is a fair bit slower.

jazzyjackson · on March 24, 2023

You can train a small model to behave like the large model at a subset of tasks.

endisneigh · on March 23, 2023

that's a complicated question to answer. what I'd say is that more parameters makes the model more robust, but there are diminishing returns. optimizations are under way

dachryn · on March 24, 2023

dont underestimate how many of those parameters are actually necessary to support multiple languages.

If you focus on english only, this can easily reduce the paramters 5fold

spacebanana7 · on March 24, 2023

Could you explain how supporting multiple languages increases the parameter count so much? I'm genuinely curious.

LLMs seem to be comfortable with hundreds of programming languages, DSLs and application specific syntaxes so how does supporting a couple more natural languages become so expensive?

I see how more training data would be needed, but I don't understand how that maps to a greater parameter count.

pornel · on March 24, 2023

Hundreds of GBs of RAM in a phone is just ~6 years away if Moore's law holds. It's also likely that memory requirements will be shrunk through software and ML improvements.

AgentOrange1234 · on March 24, 2023

I don’t think RAM on phones has been increasing exponentially? For good reason — it uses a lot of power.

pornel · on March 24, 2023

It has been increasing. In iPhones went from 128MB to 6GB, and there are Android phones with 18GB now.

RAM hasn't been increasing as steeply as it could, but if there's a strong use-case for it, it may happen. Also consider that Apple is in control of the whole chipset and software, so they could implement things like turning the extra RAM on only during ML computation.

datpiff · on March 24, 2023

> Hundreds of GBs of RAM in a phone is just ~6 years away if Moore's law holds.

It's not expected to. The consensus seems to be ~2025 https://arxiv.org/abs/1511.05956

bonestamp2 · on March 24, 2023

Not to mention storage. The model is estimated to be 500-600GB. That's a lot of storage to tie up on a phone.

danieldrehmer · on March 24, 2023

Why 5 years?

I'm running a totally usable 13b parameters llama model in my macbook air, which seems to give outputs equivalent to what I was getting from GPT3 in June 2022.

How much more hardware would it really be needed for GPT-4 level outputs natively? Perhaps software optimizations alone could do most of the trick.

lanza · on March 24, 2023

OpenAI is big tech. Also, Apple has nowhere near the AI/ML staff that the rest of the tech giants have.

bjtitus · on March 24, 2023

Big tech that already has comparable open source models that can run on consumer hardware?

doctoboggan · on March 24, 2023

Apple is such a cautious company in many respects. Do you think they have the ability to ship a largely unpredictable piece of software like an LLM?

0xDEF · on March 23, 2023

>openai is doing great work and is serious competition, but I think many underestimate big tech

OpenAI is 49% owned by Microsoft. OpenAI is big tech.

hardware2win · on March 24, 2023

OpenAI and MSFT couldnt figure it out and are burning milions of USD on compute, but Apple will make it run on phone in 5 years?

Crazy bet, what makes you think that? You cannot optimize infinitely. Raytracing probably had decades of ppl trying to make it run fast and yet even today you need strong hardware

aunty_helen · on March 24, 2023

Have a look at what the opensource community is up to. They're reducing models down to the meer billions of nodes and then using chatgpt to train their model accuracy. The results are suprising.

The reason openai have had to rush out plugins is due to software like langchain coming in at meteoric speed.

Things are moving on a day to day basis in the ai sphere at the moment.

bradDonniger · on March 23, 2023

They under estimate chip makers plans to embed AI in silicon.

Chips intended for launch in 5-6 years are in planning stages right now. Apple, nVidia, and Intel could bring serious hurt to software companies in the next 5-10.

Open source purists will weep but really most people do not care, and tech should not merely serve the dedicated.

bigpeach · on March 24, 2023

> Apple, nVidia, and Intel could bring serious hurt to software companies in the next 5-10.

What does that mean? What software companies?

bradDonniger · on March 24, 2023

Cloud based AI https://news.ycombinator.com/item?id=35283852

dev_tty01 · on March 23, 2023

I'm expecting the Mac Pro to have AI/ML massive GPU modules and/or specialized ML dedicated compute available as add-ons for in-house training. For a not-so-small fee of course. They have all the pieces, it just needs to be brought to bear on the ML training market.

brookst · on March 24, 2023

UMA is the secret.

A 256GB Mac Pro can dedicate almost all of that to AI.

Their GPUs are the weak spot. The Nvidia 4080 is 2.5x faster than the M2, and the A100 is 15 times faster.

ohgodplsno · on March 24, 2023

The 4080 is _25x_ faster than the M2 on pure fp32 (which is what most GPUs are doing most of the time). Apple compared the M2 to the laptop 4080, using numbers heavily biased to them (running a 4080 at 10W does tend to make it not perform, yes).

Not a single benchmark in the world has supported Apple's claim that the GPU in the M2 is that powerful. It's just yet another cute embedded GPU that does the job, but nothing more. It's made to push out 8K frames really fast, which it does because of UMA, but want demanding task will have it be eaten alive by any real GPU.

brookst · on March 24, 2023

ML inference is not generally FP32 anymore. I was going off of the TOPs numbers for ML from a few sources, which generally agree M2 is about 22 TOPS and 4080 (desktop) is about 50.

But in any event, yes, that was my point. UMA is a huge advantage, the GPU itself is too weak to be serious.

But it’s a lot easier to drop a dramatically beefier GPU into a new design than it is to update the entire platform for UMA. Apple has a huge opportunity here… whether tbey pursue it or not remains to be seen.

smoldesu · on March 24, 2023

> whether tbey pursue it or not remains to be seen.

Pursue what though?

UMA is cool, but kinda meaningless if the majority of Macbooks are min-spec. That leaves you with 4-5gb of VRAM, assuming you've left nothing open. What is Apple going to do with that UMA that other manufacturers cannot?

It's certainly nice that 128gb Macs exist for models that might be too big to otherwise load into memory. It's useless for production inferencing though, and I struggle to imagine the "opportunities" they're missing out on here.

brookst · on March 24, 2023

> Pursue what though?

A Mac variant that trades CPU cores for GPU/ML cores while having 192GB+ of UMA memory.

> I struggle to imagine the “opportunities”

Two of them: 1) academic / R&D compute, where people could have at least A6000 class GPU on the desktop, and 2) cloud inference servers, probably for Apple’s own services.

I’m not saying they will or should do those things, just that the apple silicon arch is well positioned if they choose to. Bolting on exponentially better GPU is not especially difficult, and they’ve got an OS that would bring existing apps and libraries right over.

Look at it this way: is there a path to UMA on Windows / Linux? If not, those systems will always duplicate RAM and require users to decide in advance whether to allocate RAM budget to OS or ML.

smoldesu · on March 24, 2023

Academics might buy in, but they're a small market and still fairly easy to poach with quality server hardware. You may be right about Apple using them for cloud inferencing though, seeing how they'd rather pound sand than patch up their relationship with Nvidia.

Whichever way you look at it though, neither of those are really opportunities. Apple boxed themselves into the consumer market, and now has to compete with professionally-priced products.

brookst · on March 25, 2023

I think you vastly overestimate the emotionality of corporate execs.

ioedward · on March 24, 2023

No it won't. Large language models are trained on 1,000 - 50,000 GPUs. No one's going to buy hundreds of Mac pros to mount them in a datacenter for training ML models.

smoldesu · on March 23, 2023

I mean, it's possible to load LLMs onto a smartphone today. The utility is limited though, and if iPhone 19 still has arbitrary application memory requirements then I think it's safe to say they won't be using the most complete models. At the end of the day, OpenAI's "huge-ass LLM as a service" will probably be relevant longer than you think it will be. Local inferencing might be able to do simpler stuff (eg. compose assistant speech) but this is already possible with pruned models and CPU acceleration.

The bull mindset is fun to watch unfold (especially here on HN) but I think people should temper their expectations.

endisneigh · on March 23, 2023

I never said anything about openai not being relevant. the point is scale and good enough. gpt-4 is already good enough for most reasonable use cases, notably siri optimization. 5 years is a long time. 5 years ago llms didn't even exist.

shmoogy · on March 23, 2023

Gpt3.5-turbo runs laps around siri and is cheap, surely Apple will acquire talent to build something that does remote queries and falls back to local for the next 1-2 years while they figure out on device accelerated LLMs

crooked-v · on March 24, 2023

I don't think they'd even need to go that far to get started. Alpaca-LoRA is better than Siri when it comes to understanding queries and can already run on consumer GPUs. I would be really surprised if they couldn't get something equivalent running on the top-series A-chip devices in relatively short order.

scarface74 · on March 24, 2023

People keep saying this. But it’s only half the problem to process natural language in text form. How well would ChatGPT do at understanding Spanish from a non native speaker with a southern US accent? (Raises hand). I’ve yet to see a speech to text system that does well at understanding speech by non native speakers or even native English speakers with a heavy regional accent.

yazaddaruvala · on March 24, 2023

We have yet to see any Large Speech Models or Large Multimodal Models that use speech and text as inputs.

It’s only the being but the general process to build this stuff now exists and the value proposition is crystal clear.

smoldesu · on March 23, 2023

> 5 years ago llms didn't even exist

Almost 5 years ago TalkToTransformer did 80% of ChatGPT's job with 0% of the hype. Once people realize just how glacial all this stuff moves, I think the honeymoon phase will be over.

endisneigh · on March 23, 2023

right, that's why I mentioned performance at the current level at far less compute, and not necessarily better performance.

1-6 · on March 24, 2023

Live translation on AirPods, here we go!