Hacker News new | past | comments | ask | show | jobs | submit login

Isnt 3-70b so good, reddit llamaers are saying people should buy hardware to run it?

Llama-3-8b was garbage for me but damn 70b is good enough




The unquantized llama 70B requires 142GB of VRAM. Some of the quantized versions are quite decent but they do tend to get overquantized below around 26.5GB of VRAM (~3 bits per weight).

So you’d at minimum be looking at dual 3090 with NVLink for about $4000 or so. Or for the highest performing non-quantized model, you’d be spending about $40,000 for two A100’s.


So a MacBook m series is a decent buy


No need for NVLink just for inference, not even with tensor parallelism. And you can get used 3090 much cheaper than that.


True, you can buy 2x 4090 FE brand new for $4000




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: