Hacker News new | past | comments | ask | show | jobs | submit login

Nothing complex actually but it is a little messy and cobbled together

I run oobabooga's API on docker with a 13B 4bit quantized model. https://github.com/oobabooga/text-generation-webui

We use GTX 3060s because there the best bang for there buck in terms of VRAM. Our current set up is mostly proof of concept or used of inner office work well we work on scaling to get a fluid handler built so it can distribute workloads around the multiple GPUs.

Lucky the crypto mining community laid the ground work for some of the hardware.




> Lucky the crypto mining community laid the ground work for some of the hardware.

Are you using riser cards to connect the GPU's to the motherboards then? I thought about trying a setup like yours, but was worried that the riser card interfaces would create a bottleneck. Ideally I'd like to run some cards in a separate box and connect them to my main computer through some kind of cable interface, but I'm not sure if that's possible without seriously affecting performance.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: