It might be tough to make more efficient, but $15k seems exactly about the price...

elorant · on Aug 15, 2023

In my opinion the best hardware to run big models is to go and get a mac studio ultra. You have 192GB of unified RAM which can run pretty much every available model without losing performance. And it would cost you half that price.

qwytw · on Aug 15, 2023

> without losing performance

But isn't M2 ULTRA over 20x slower than this thing? ~30 TFlops vs 738.

elorant · on Aug 15, 2023

By losing performance I meant you don’t need to quantize the model a lot since it fits in RAM. My bad for not clarifying it.

gsuuon · on Aug 16, 2023

According to this [1] article (current top of hn) memory bandwidth is typically the limiting factor, so as long as your batch size isn't huge you probably aren't losing too much in performance.

https://finbarr.ca/how-is-llama-cpp-possible/