Ask HN: Local LLM's

version_five · on Aug 9, 2023

Llama.cpp and you can download one of the quantized models directly from "thebloke" on HF. I can't 100% vouch for it because I have no idea how it builds under linux on apple silicon, I'd be very interested to know if there are any issues and how well it uses the processor.

https://github.com/ggerganov/llama.cpp https://huggingface.co/TheBloke

You should be able to at least run the 7B and probably the 13B.

For reference, I can run the 7B just fine on my 2021 Lenovo laptop with 16GB ram (and ubuntu 20.04)

Ms-J · on Aug 17, 2023

Thanks, yes I've seen someone else mention trying Llama.cpp. I'll see if I can set it up, I'm new to this and will look for a guide on how to use Llama.cpp and report back as if it builds and runs well on Apple Silicon. I think it would be a nice write up for the community as there isn't too much out there about Linux on AS in general.

Patrick_Devine · on Aug 11, 2023

Ollama does work on Linux, it's just that we haven't (yet) made it work with GPUs other than Metal. We'll get there soon, but we're a small team and wanted to make sure everything was working well before adding more platforms.

You can build it yourself with `go build .` if you've cloned the repository.

Ms-J · on Aug 17, 2023

This is great to know! Thanks for the update, I will take a look at the Linux version.

Ms-J · on Aug 17, 2023

Thanks all for replying, I'm sorry I didn't realize that there were replies until now. The advice is great and I'll see if I can get some of the models I referenced running under Linux now and will report back with a write up on how it was achieved if successful.

brucethemoose2 · on Aug 10, 2023

Koboldcpp (a nice frontend for llama.cpp) is The Way.

You really want to run OSX though, as its not very fast without Metal (or Vulkan). Also, you need a relatively high memory M1 model to run the better llama variants.

Ms-J · on Aug 17, 2023

I'll take a look into Koboldcpp, a frontend is always nice, thanks! I do have the max specs on this M1.

fsmv · on Aug 10, 2023

I believe to get the M1 efficiency for LLMs they use the Metal API which I don't think will work on Linux. You may have to dual boot to use it for ML.

gorenb · on Aug 10, 2023

I use Ubuntu only on my computer, and Oobagooba text generation web ui really helped. I hope this helps you!

Ms-J · on Aug 17, 2023

I've seen this mentioned in some guides I have read (I believe this is the one you are referencing: https://github.com/oobabooga/text-generation-webui) and will definitely look into it, thanks!

smoldesu · on Aug 9, 2023

There shouldn't be any real roadblocks in your setup. If you can find an inferencing tool with ARM support, you should be good to go.

Ms-J · on Aug 17, 2023

Do you have any suggestions? I'm new to using LLM's in general, thanks.