logoalt Hacker News

croemer01/20/20252 repliesview on HN

Can someone ELI5 what the difference is between using the "quantized version of the Llama 3" from unsloth instead of the one that's on ollama, i.e. `ollama run deepseek-r1:8b`?


Replies

whereismyacc01/20/2025

The weights are quantized down to fewer bits in order to save on memory. The quantization loss is going to result in worse generations.

show 1 reply
dragonwriter01/21/2025

They are probably the same model, unsloth does model quants and provides them to the community, AFAIK ollama doesn't, they just indexes publicly available models, whether full or quantized, for convenient use in their frontend.