Ollama has been deliberately misrepresenting R1 distill models as "R1" for marke...

wklauss · 2025-01-28T14:32:16 1738074736

Ollama is pretty clear about it, it's not like they are trying to deceive. You can also download the 671B model with Ollama, if you like.

whimsicalism · 2025-01-28T15:16:06 1738077366

no they are not, they intentionally remove every reference to this not being r1 from the cli and changed the names from the ones both Deepseek and Huggingface used.

blacklightpy · 2025-01-30T16:55:51 1738256151

Yet, I did not see a single issue made on the GitHub repository, and I just made one myself (https://github.com/ollama/ollama/issues/8698).

nyrikki · 2025-01-28T21:06:08 1738098368

They used short strings for the names, which is very different from deception.

https://ollama.com/search

> DeepSeek's first-generation of reasoning models with comparable performance to OpenAI-o1, including six dense models distilled from DeepSeek-R1 based on Llama and Qwen.

Well I guess if you are in the Enterprise Java naming model you would expect something like "VisitorModelUtilsListGetterAdapterInterceptorMessageManagerDrivenObserverPool"

If you look at their API docs you will see:

    model: name of the model to push in the form of <namespace>/<model>:<tag>

I don't think there is any reason to jump to the conclusion it is some type of conspiracy here, just naming things based on a API that probably didn't think about distillation when they created it.

woadwarrior01 · 2025-01-28T14:37:27 1738075047

Yeah, they're so clear in fact that they call the distilled models "R1" in the url and everywhere on the page[1], instead of using the "DeepSeek-R1-Distill-" prefix, as DeepSeek themselves do[2].

[1]: https://ollama.com/library/deepseek-r1

[2]: https://github.com/deepseek-ai/DeepSeek-R1#deepseek-r1-disti...

horsawlarway · 2025-01-28T14:40:58 1738075258

I mean... yes. The DeepSeek announcement puts R1 right there in the name for those models. https://api-docs.deepseek.com/news/news250120

It's fairly clear that R1-Llama or R1-Qwen is a distill, and they're all coming directly from DeepSeek.

As an aside, at least the larger distilled models (I'm mostly running r1-llama-distill-70b) are definitely not the same thing as the base llama/qwen models. I'm getting better results locally, admittedly with the slower inference time as it does the whole "<think>" section.

Surprisingly - The content in the <think> section is actually quite useful on its own. If you're using the model to spitball or brainstorm, getting to see it do that process is just flat out useful. Sometimes more-so than the actual answer it finally produces.