Hacker News new | past | comments | ask | show | jobs | submit login

34B should be able to run on 24GiB consumer graphics card, or 32GiB Mac (M1 / M2 chips) with quantization (5~6bit) (and 7B should be able to run on your smart toaster).



Are there cloud offerings to run those models on somebody's else computer?

Any "eli5" tutorial on how to do so, if so?

I want to give these models a run but I have no powerful GPU to run them on so don't know where to start.


runpod, togethercomputer, replicate.

Matthew Berman has a tutorial on YT showing how to use TheBloke's docker containers on runpod. Sam Witteveen has done videos on together and replicate, they both offer cloud-hosted LLM inference as a service.


I started something here about this: https://news.ycombinator.com/item?id=37121384


On runpod there is a TheBloke template with everything set up for you. An A6000 is good enough to run 70b 4bit.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: