Llama.cpp and you can download one of the quantized models directly from "thebloke" on HF. I can't 100% vouch for it because I have no idea how it builds under linux on apple silicon, I'd be very interested to know if there are any issues and how well it uses the processor.
Thanks, yes I've seen someone else mention trying Llama.cpp. I'll see if I can set it up, I'm new to this and will look for a guide on how to use Llama.cpp and report back as if it builds and runs well on Apple Silicon. I think it would be a nice write up for the community as there isn't too much out there about Linux on AS in general.
https://github.com/ggerganov/llama.cpp https://huggingface.co/TheBloke
You should be able to at least run the 7B and probably the 13B.
For reference, I can run the 7B just fine on my 2021 Lenovo laptop with 16GB ram (and ubuntu 20.04)