Hacker News new | past | comments | ask | show | jobs | submit login

1. You're working backwards from a desire to buy more RAM to try and find uses for it.

I'm really not

I had no desire at all until a couple of weeks ago. Even now not so much since it wouldn't be very useful to me

But the current LLM business model where there are a small number of API providers, and anything built using this new tech is forced into a subscription model... I don't see it sustainable, and I think the buzz around llama.cpp is a taste of that

I'm saying imagine a future where it is painless to run a ChatGPT-class LLM on your laptop (sounded crazy a year ago, to me now looks inevitable within few years), then have a look at the kind of things that can be done today with Langchain... then extrapolate




It sounds like we are in a similar position. I had no desire to get a 64gb laptop from apple until all the interesting things from running llama locally came out. I wasn't even aware of the specific benefit of that uniform memory model on the mac. Now I'm looking at do I want to do 64, 96 or 128gb. For an insane amount of money, 5k for that top end one.


The unified memory ought to be great for running LLaMA on the GPU on these Macbooks (since it can't run on the Neural Engine currently)

The point of llama.cpp is most people don't have a GPU with enough RAM, Apple unified memory ought to solve that

Some people have it working apparently:

https://github.com/remixer-dec/llama-mps


Thank you, that's exactly what I was looking for, specific info on perf.


I think the GPU performance for inference is probably limited currently by immaturity of PyTorch MPS (Metal) backend

before I found the repo above I had a naive attempt to get llama running with mps and it didn't "just work" - bunch of ops not supported etc


I think llama.cpp will die soon because the only models you can run with it are derivatives of a model that Facebook never intended to be publicly released, which means all serious usage of it is in a legal limbo at best and just illegal at worst. Even if you get a model that's clean and donated to the world, the quality is still not going to be competitive with the hosted models.

And yes I've played with it. It was/is exciting. I can see use cases for it. However none are achievable because the models are (a) not good enough and (b) too legally risky to use.


(A) is very use case depending. Even with some of the bad smaller models now, I can see devs making use of them to enhance their app (e.g. local search, summaries, sentiments, translations)

(B) llama.cpp supports gpt4all, which states that its working on fixing your concern. This is from their README:

Roadmap Short Term

- Train a GPT4All model based on GPTJ to alleviate llama distribution issues.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: