Hacker News new | past | comments | ask | show | jobs | submit login

i was curious on what it takes to run this, the smallest ovh public cloud instance with gpu costs +500$/month before taxes



You can run a 7B model on CPU relatively quickly. If you want to go faster, the best value in public clouds may be a rented Mac mini.


Do you have any resources to read on how to host LLMs in general? I am looking for scaleable ways to host our own models. Thanks.


Sorry I haven’t followed the latest developments to run at scale since the summer. I don’t have concurrent users so llama.cpp or diffusers are good enough for me.


Could it run on a 4x 3090 24GB rig?

These can be built for about $4500 or less all-in.

Inference FLOPs will be roughly equivalent to ~1.8X A100 perf.


You could run it on a single high end GPU. I can run llama2's models ,(except 70b) on my 4080.


This can run on 1x 2060S 8 GB


With what degree of quantization?


None, just the default weights using ollama. It's fast too. 13b is where things get slow


does a 4x 3090 rig need nvswitch


Presumably some compute per hour service would make more sense for playing around with it?




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: