LLMs and many other models spend 99% of the FLOPs in matrix multiplication. And ... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

YetAnotherNick 5 months ago | parent | context | favorite | on: LlamaF: An Efficient Llama2 Architecture Accelerat...

LLMs and many other models spend 99% of the FLOPs in matrix multiplication. And TPU initially had just single operation i.e. multiply matrix. Even if the MSIC is 100x better than GPU in other operations, it would just be 1% faster overall.

danielmarkbruce 5 months ago [–]

You can still optimize various layers of memory for a specific model, make it all 8 bit or 4 bit or whatever you want, maybe burn in a specific activation function, all kinds of stuff.

No chance you'd only get 1% speedup on a chip designed for a specific model.

Join us for AI Startup School this June 16-17 in San Francisco!
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact