Hacker News new | past | comments | ask | show | jobs | submit login

You can run a 7B model on CPU relatively quickly. If you want to go faster, the best value in public clouds may be a rented Mac mini.



Do you have any resources to read on how to host LLMs in general? I am looking for scaleable ways to host our own models. Thanks.


Sorry I haven’t followed the latest developments to run at scale since the summer. I don’t have concurrent users so llama.cpp or diffusers are good enough for me.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: