Hacker News new | past | comments | ask | show | jobs | submit login

I've been looking around at open source offerings that can run on "affordable" hardware. Right now you can run something like GPT-J on 4 GTX 3090s, which will cost you anywhere between 6k and 8k not including the rest of the PC.

Once functional models get small enough, I think we'll see an absolute explosion in AI as anyone with a laptop will be able to freely tinker with their models.




You can run GPT-J on an M1 or Pi if you want, you'll just have to settle for a pruned model. No self-hosted options (besides maybe Facebook's leaked thing) can stand up to ChatGPT's consistency or size. I've also been playing with this and have had great results for non-interactive use on even the smallest models (125m params, ~2gb RAM). The problem as I see it won't be inference time/acceleration as much as it will be having enough memory to load the model, and settling for lower quality answers. ChatGPT is already pretty delusional, and pruning it doesn't make it any smarter. You can practically feel the missing connections in the quality of response.

So, big takeaway: AI text generation is legible and quick with smaller models, but "intelligence" a-la ChatGPT seems to scale directly with memory.


Could we do two models that are different levels of abstraction? One for "ideas" and one for compressing the idea into words? I've been thinking that smaller specialized networks might boost space efficiency.

I've no clue how these would be wired together, but I have done ideas.


Have you seen the latest Llama and Alpaca models?




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: