I was able to run the 4bit quantized LLAMA2 7B on a 2070 Super, though latency w... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

halflings on Sept 12, 2023 | parent | context | favorite | on: Fine-tune your own Llama 2 to replace GPT-3.5/4

I was able to run the 4bit quantized LLAMA2 7B on a 2070 Super, though latency was so-so.

I was surprised by how fast it runs on an M2 MBP + llama.cpp; Way way faster than ChatGPT, and that's not even using the Apple neural engine.

hereonout2 on Sept 12, 2023 [–]

It runs fantastically well on M2 Mac + llama.cpp, such a variety of factors in the Apple hardware making it possible. The ARM fp16 vector intrinsics, the Macbook's AMX co-processor, the unified memory architecture, etc.

It's more than fast enough for my experiments and the laptop doesn't seem to break a sweat.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact