Hacker News new | past | comments | ask | show | jobs | submit login

They're memory bandwidth limited, you can basically just estimate the performance from the time it takes to read the entire model from ram for each token.





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: