The weights are quantized down to fewer bits in order to save on memory. The quantization loss is going to result in worse generations.
Ollama serves multiple versions, you can get Q8_0 from it too:
ollama run deepseek-r1:8b-llama-distill-q8_0
The real value from the unsloth ones is that they were uploaded before R1 appeared on Ollama's model list.
Ollama serves multiple versions, you can get Q8_0 from it too:
ollama run deepseek-r1:8b-llama-distill-q8_0
The real value from the unsloth ones is that they were uploaded before R1 appeared on Ollama's model list.