logoalt Hacker News

whereismyacc01/20/20251 replyview on HN

The weights are quantized down to fewer bits in order to save on memory. The quantization loss is going to result in worse generations.


Replies

ColonelPhantom01/20/2025

Ollama serves multiple versions, you can get Q8_0 from it too:

ollama run deepseek-r1:8b-llama-distill-q8_0

The real value from the unsloth ones is that they were uploaded before R1 appeared on Ollama's model list.

show 1 reply