The weights are quantized down to fewer bits in order to save on memory. The quantization loss is go...

whereismyacc • 01/20/2025 • 1 reply • view on HN

The weights are quantized down to fewer bits in order to save on memory. The quantization loss is going to result in worse generations.

ColonelPhantom • 01/20/2025

Ollama serves multiple versions, you can get Q8_0 from it too:

ollama run deepseek-r1:8b-llama-distill-q8_0

The real value from the unsloth ones is that they were uploaded before R1 appeared on Ollama's model list.

➕ show 1 reply

alt Hacker News