Inference speed is heavily dependent on memory read/write speed versus size. As ... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

ata_aman 24 days ago | parent | context | favorite | on: Deepseek R1 Distill 8B Q40 on 4 x Raspberry Pi 5

Inference speed is heavily dependent on memory read/write speed versus size. As long as you can fit the model in memory, what’ll determine functionality is the mem bandwidth.

menaerus 23 days ago [–]

This is not universally true although I see this phrase being repeated here too often. And it is especially not true with the small models. Small models are compute-bound.

Join us for AI Startup School this June 16-17 in San Francisco!
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact