34B *should* be able to run on 24GiB consumer graphics card, or 32GiB Mac (M1 / ...

epolanski · on Aug 24, 2023

Are there cloud offerings to run those models on somebody's else computer?

Any "eli5" tutorial on how to do so, if so?

I want to give these models a run but I have no powerful GPU to run them on so don't know where to start.

RossBencina · on Aug 25, 2023

runpod, togethercomputer, replicate.

Matthew Berman has a tutorial on YT showing how to use TheBloke's docker containers on runpod. Sam Witteveen has done videos on together and replicate, they both offer cloud-hosted LLM inference as a service.

kordlessagain · on Aug 25, 2023

I started something here about this: https://news.ycombinator.com/item?id=37121384

redox99 · on Aug 24, 2023

On runpod there is a TheBloke template with everything set up for you. An A6000 is good enough to run 70b 4bit.