Hacker News new | past | comments | ask | show | jobs | submit login

Haha, I just finished ordering 32GB of additional memory for my PC so I can run the 65B model, if that tells you anything. I'm upgrading from 32GB -> 64GB.

7B is fine, 13B is better. Both are fun toys and almost make sense most of the time, but even with a lot of parameter tuning they're often incoherent. You can tell that they have encoded fewer relationships between concepts than the higher-parameter models we've gotten used to--it's much closer to GPT-2 than GPT-3.

They're good enough to whet my appetite and give me a lot of ideas of what I want to do, they're just not quite good enough to make those applications reliably useful. Based on the reports I'm hearing here of just how much better the 65B model is than the 7B, I decided it was worth $80 for a few new sticks of RAM to be able to use the full model. Still way cheaper than buying a graphics card capable of handling it.




Heh, you just made me upgrade as well. After originally paying 130 € for 32 GB, it’s nice that I only had to pay 70 € to double it ;) Not sure if I want to run LLMs (or if my Ryzen 5 3600 is even powerful enough), but I’ve wanted some more RAM for a while.


If I was running in a server context, would the 50gb of ram be required to respond to one request, or can it be used to respond to multiple requests simultaneously?


I'm very late to this question, but I believe that that amount is only required once, but the context tensor will need to be created per request. I haven't confirmed that, though.


I'd assume that all the calculations used for 1 request would already eat up that amount of memory, but I could be wrong!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: