I think this is the point - each design team and company bet on what they believe their hardware is likely to need to perform. The performance vs power consumption trade off is really tricky when you have to guess what the software landscape will look like a few years out. Get it right, and you have a winner, get it slightly wrong and you either have 'poor battery life' in reviews, or you offload stuff which ideally would be running locally (which is the suggestion in this case).
I'm not sure I want LLMs on my phone anyway - the situations where you want a 'digital assistant' are generally ones where you've got connectivity, so offloading seems like a sensible fit for me. Of course i'm making that assumption and i'll probably look silly, but that's progress right?
Of more interest is the heavy lifting for image processing to make the phone camera not suck (it's all smoke and mirrors with phone cameras, and more inference on device helps).
I'd love to have an LLM on my phone when no connectivity. There are many situations where a personal assistant would come in handy.
In the wilds if Canada somewhere:
Hey google. Lead me through the steps to create a splint and stop the bleeding. My buddy just got attacked by a bear and is bleeding out.
What? That's not a bear? It's a racoon ? Oh...anyway.
Lead me through the steps to start a fire with wet wood and how do I prepare a bear, I mean racoon, carcass for good eating. I have some old copper wire and a magnet. I need to make a phone charger. Lead me through it. Also make sure the fire is smokeless as we don't want to get discovered and sent back to the supermax.
Translating that information into an easily understood format is the job of a LLM, but it's not having the data, right? Have we not moved the goalpost a bit if we want a model that contains the majority of human knowledge, including accurate and guaranteed non-hallucinated survival information, to be stored locally on our phone?
GPT-4 by some estimates is 45GB of training data. But phones can easily store 10 times that much and more if it was really required. The bottle neck is the data processing required to make inferences on that data.
I'm not sure I want LLMs on my phone anyway - the situations where you want a 'digital assistant' are generally ones where you've got connectivity, so offloading seems like a sensible fit for me. Of course i'm making that assumption and i'll probably look silly, but that's progress right?
Of more interest is the heavy lifting for image processing to make the phone camera not suck (it's all smoke and mirrors with phone cameras, and more inference on device helps).