Hacker News new | past | comments | ask | show | jobs | submit login

Qwen2.5 has a 32B release, and quantised at q5_k_m it *just about" completely fills a 4090.

It's a good model, too.




Do you also need space for context on the card to get decent speed though?


Depends how much you need. Dropping to q4_k_m gives you 3GB back if that makes the difference.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: