While intentionally a bit click-baitey, I think the author is saying that featur...

danieldk · on Nov 23, 2023

As far as I understand, some inference is done on-device. LLMs and diffusion just changed the field in the last year and it takes time for hardware to catch up (+ work to reduce model sizes). So, it's just hard to run the latest models on-device. So you either end up doing it online (Google's preference) or having weaker models (Apple's preference).