I'm similarly skeptical, but that said I'm running 30B parameter LLMs on my 32GB...

qumpis · on March 24, 2023

Offtopic, but for what purpose are you running llms locally (especially everyday)? My understanding was that the prompting requires to make them work at all was too great.

garblegarble · on March 24, 2023

A little bit of research, a little bit of actual useful tasks - I'm interested in summarisation, which alpaca is decent at (even compared to existing summarisation-specific models I've tried)

My other motivation is making sure I understand what offline LLMs can do... while I use GPT-3 and 4 extensively, I don't want to send something over the wire if I don't have to (e.g. if I can summarise e-mails locally, I'd rather do that than send them to OpenAI).

It's also surprisingly good at defining things if I'm somewhere with no internet connectivity and want to look something up (although obviously that's not really what it's good at & hallucination risks abound)

johnthuss · on March 24, 2023

What prompt are you using for summarization? I’ve tried several variations without consistent results.

garblegarble · on March 24, 2023

On alpaca, I've found "Below is an instruction that describes a task. Write a response that appropriately completes the request. Summarise the following text: " or "Give me a 5 word summary of the following: " to work fairly well using the 30B weights.

It's certainly nowhere close to the quality of OpenAI summarisation, just better than what I previously had locally (e.g. in summarising a family history project with transcripts of old letters, gpt-3.5-turbo was able to accurately read between the lines summarising an original poem which I found amazing).

I half wonder if the change in spelling from US -> UK makes a difference...

I'd run a test on that but I've just broken my alpaca setup for longer prompts (switched to use mainline llama.cpp, which required a model conversion & some code changes, and it's no longer allocating enough memory)

bigfudge · on March 24, 2023

Necessary if you have sensitive datasets you can’t share with US company

_rs · on March 24, 2023

Off topic slightly, but are you running into limits with 32GB RAM that the 64GB model would meaningfully be adequate for? Do you wish you had one of the larger RAM models?

garblegarble · on March 24, 2023

I've been pretty happy with 32GB, but the 30B models do push near to the limits. I don't see a big difference between the quality of 65B (running on a 64GB x86 host) and 30B on M1 (although that may be the 4bit quantisation though, so take that with a grain of salt). I'm just glad that I have it on an M1... I have a 3080 in my PC, but when I got that I was thinking more of Stable Diffusion and YOLO tasks rather than LLMs, and it just doesn't have the VRAM for LLMs.

Alpaca seems like it could be significantly improved with better training (some of the old training data was truncated), so I think there's a decent amount of improvement to be had at the current model size.

In the future though... what would really be a meaningful change would be a larger context size - the 8k tokens of GPT-4 was a big improvement for my uses... I would guess a future local llm with larger context would exceed 32GB, but that's speculation beyond my expertise, I don't know how context size and network size scale.

If it was a PC I'd say go for 64GB, but hard to recommend that given how much Apple charge for RAM upgrades. On my next upgrade (2+ years time, hopefully) I'll likely opt for 64GB+ though

_rs · on March 24, 2023

Yeah, it is expensive. My other strong consideration is battery life, since DRAM is always running; going from 32 to 64 would be a hit to battery life regardless of workload, but hard to say exactly how big of a hit.

I'm curious, which configuration of the M1 MBP do you have?

garblegarble · on March 24, 2023

I went for the 16" with M1 Max w/32 GPU cores and 1TB SSD (500GB free, I offload most large files my NAS/iCloud). On the added power usage, my understanding is that's less of a concern due to using LPDDR5?

The only drawback I've found with the M1 Max model is the added weight from the bigger heatsink just makes it a hair heavier than I'd like when picking it up at the front with one hand when open... and that in the winter time the case is cold no matter what you're running, I used to love that my Intel MBP acted as a mini leg warmer :-)