Isnt 3-70b so good, reddit llamaers are saying people should buy hardware to run...

reaperman · 2024-06-06T23:00:40 1717714840

The unquantized llama 70B requires 142GB of VRAM. Some of the quantized versions are quite decent but they do tend to get overquantized below around 26.5GB of VRAM (~3 bits per weight).

So you’d at minimum be looking at dual 3090 with NVLink for about $4000 or so. Or for the highest performing non-quantized model, you’d be spending about $40,000 for two A100’s.

norwalkbear · 2024-06-20T21:27:39 1718918859

So a MacBook m series is a decent buy

Manabu-eo · 2024-06-06T23:16:28 1717715788

No need for NVLink just for inference, not even with tensor parallelism. And you can get used 3090 much cheaper than that.

wkat4242 · 2024-06-06T23:21:14 1717716074

True, you can buy 2x 4090 FE brand new for $4000