logoalt Hacker News

AJRF01/20/20254 repliesview on HN

Just tried hf.co/unsloth/DeepSeek-R1-Distill-Qwen-14B-GGUF:Q4_K_M on Ollama and my oh my are these models chatty. They just ramble on for ages.


Replies

whitehexagon01/20/2025

I find the qwq 32B a bit like that. I asked for a recipe for something in minecraft 1.8, and it was page after page of 'hmm, that still doesnt look right, maybe if I try...' although to be fair I did ask for an ascii art diagram for the result. It will be interesting to try a DeepSeek 32B qwq if that is planned, because otherwise pretty happy with it.

I just wish that less development chat was happening within walled gardens because none of these seem to be much help with Zig.

zamadatix01/21/2025

I noticed the smaller the model (be it quant or parameters as the cause) the faster it'd run.... but the longer it'd fight itself. For the same Calc II level problem all models were able to eventually get an answer but the distilled Qwen-32B at Q6 quant was fastest to actual answer completion.

ilaksh01/21/2025

That's the point.. the rambling is their reasoning process.

show 1 reply
bradhilton01/20/2025

They need to be trained with a small length penalty