The size mentioned is already quantized (and to integers, not floats). mmap obvi... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

gliptic on April 1, 2023 | parent | context | favorite | on: Llama.cpp 30B runs with only 6GB of RAM now

The size mentioned is already quantized (and to integers, not floats). mmap obviously doesn't do any quantization.

Consider applying for YC's W25 batch! Applications are open till Nov 12.
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact