Hacker News new | past | comments | ask | show | jobs | submit login

As far as I understand, the main issue for LLM inference is memory bandwidth and capacity. Tensor cores are already an ASIC for matmul, and they idle half the time waiting on memory.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: