Hacker News new | past | comments | ask | show | jobs | submit login

What speed should we expect from the model on consumer hardware? I tried a 8 bit quantized version on 4090 and got it to generate 100 tokens for 13 second, which seems a bit slow to me.



Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: