Hacker News new | past | comments | ask | show | jobs | submit login

For basic usage, you can get away with a small graphics card or no graphics card at all (albeit it will be very slow).

The general rule of thumb is, take a model size (7B, 13B, 34B, 70B) and multiply that by 0.5 or 0.625. If that number is smaller than the combined amount of system RAM and VRAM in your system, you can run the model at 4-bit and 5-bit quantization respectively.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: