Kudos on your release! I know this was just made available but - Somewhere the R...

austinvhuang · 2024-02-23T17:46:33 1708710393

Yes - thanks for pointing that out. The README is being updated, you can see an updated WIP in the dev branch: https://github.com/google/gemma.cpp/tree/dev?tab=readme-ov-f... and improving error messages is a high priority.

The weights should be the same across formats, but it's easy for differences to arise due to quantization and/or subtle implementation differences. Minor implementation differences has been a pain point in the ML ecosystem for a while (w/ IRs, onnx, python vs. runtime, etc.), but hopefully the differences aren't too significant (if they are, it's a bug in one of the implementations).

There were quantization fixes like https://twitter.com/ggerganov/status/1760418864418934922 and other patches happening, but it may take a few days for patches to work their way through the ecosystem.

leminimal · 2024-02-25T19:38:06 1708889886

Thanks, I'm glad to see your time machine caught my comment.

I'm using the 32-bit GGUF model from the Google repo, not a different quantized model, so I could have one less source of error. It's hard to tell with LLMs if its a bug. It just gives slightly stranger answers sometimes, but it's not completely gibberish. or incoherent sentences or have extra punctuations like with some other LLM bugs I've seen.

Still, I'll wait a few days to build llama.cpp again to see if there are any changes.