> How did author just assumed that CPU are competetive for inference.
CPUs have IGPs. And they are pretty good these days.
LLMs in particular are an odd duck because the compute requirements are relatively modest compared to the massive model size, making them relatively RAM bandwidth bound. Hence DDR5 IGPs/CPUs are actualy a decent fit for local inference.
Its still inefficient, yeah. Dedicated AI blocks are the way to go, and many laptop/phone CPUs already have these, they just aren't widely exploited yet.
CPUs have IGPs. And they are pretty good these days.
LLMs in particular are an odd duck because the compute requirements are relatively modest compared to the massive model size, making them relatively RAM bandwidth bound. Hence DDR5 IGPs/CPUs are actualy a decent fit for local inference.
Its still inefficient, yeah. Dedicated AI blocks are the way to go, and many laptop/phone CPUs already have these, they just aren't widely exploited yet.