34B should be able to run on 24GiB consumer graphics card, or 32GiB Mac (M1 / M2 chips) with quantization (5~6bit) (and 7B should be able to run on your smart toaster).
Matthew Berman has a tutorial on YT showing how to use TheBloke's docker containers on runpod. Sam Witteveen has done videos on together and replicate, they both offer cloud-hosted LLM inference as a service.