This seems suspiciously like a bug (either in inference or in mmap reporting), a...

dvt on April 1, 2023 | parent | context | favorite | on: Llama.cpp 30B runs with only 6GB of RAM now

This seems suspiciously like a bug (either in inference or in mmap reporting), as these models are not sparse enough for the savings to come from anywhere viable.