Can someone ELI5 what the difference is between using the "quantized version of the Llama 3&quo...

croemer • 01/20/2025 • 2 replies • view on HN

Can someone ELI5 what the difference is between using the "quantized version of the Llama 3" from unsloth instead of the one that's on ollama, i.e. `ollama run deepseek-r1:8b`?

Replies

whereismyacc • 01/20/2025

The weights are quantized down to fewer bits in order to save on memory. The quantization loss is going to result in worse generations.

➕ show 1 reply

dragonwriter • 01/21/2025

They are probably the same model, unsloth does model quants and provides them to the community, AFAIK ollama doesn't, they just indexes publicly available models, whether full or quantized, for convenient use in their frontend.

alt Hacker News

Replies