I want to like Ollama, but I wish it didn't obfuscate the actual directives (full prompt) that it sends to the underlying model. Ollama uses a custom templatizing script in its Modelfiles to translate user input into the format that a specific model expects ([INST], etc), but it can be difficult to tell if it's working as expected because it won't show up in the logs at all.
Other than that it's a great project - very easy to get started and has a solid API implementation. I've got it running on both a Win 10 + WSL2 docker and on a Mac M1.
yeah I guess I could compare the output at 0.0 temperature with same prompt using Modelfile and then after using raw mode with my best guess of how the modelfile is creating the raw data it's passing to Ollama.
I'd push a PR to the repo itself but I have zero experience with Go...
Yeah, I was surprised Ollama was not mentioned as it’s by far the easiest to get started with. If it only had real grammar support, I’d never have to use another library again (it has a JSON mode that generally works, however).
What is grammar support? I've seen that mentioned several times now. Does it allow to restrict the output to a given template, or am I totally wrong there?
Ollama is great. I discovered it today while looking for a way to serve LLMs locally for my terminal command generator tool (cmdh: https://github.com/pgibler/cmdh) and was able to get it up and running and implement support for it very easily.
Yesterday I tried mixtral 7bx8 running on the CPU. With an Intel 11th gen chip and 64gb DDR4 at 3200mhz, I got around 2-4 tokens/second in a small context, this gets progressively slower as the context grows.
You would get a much better experience with apple silicon and lots of RAM
https://ollama.ai/library/mistral
curl https://ollama.ai/install.sh | sh
ollama run mistral:text