We re-uploaded Gemma4 4 times - 3 times were due to 20 llama.cpp bug fixes, which we helped solve so...

danielhanchen • yesterday at 3:46 PM • 5 replies • view on HN

We re-uploaded Gemma4 4 times - 3 times were due to 20 llama.cpp bug fixes, which we helped solve some as well. The 4th is an official Gemma chat template improvement from Google themselves, so these are out of our hands. All providers had to re-fix their uploads, so not just us.

For MiniMax 2.7 - there were NaNs, but it wasn't just ours - all quant providers had it - we identified 38% of bartowski's had NaNs. Ours was 22%. We identified a fix, and have already fixed ours see https://www.reddit.com/r/LocalLLaMA/comments/1slk4di/minimax.... Bartowski has not, but is working on it. We share our investigations always.

For Qwen3.5 - we shared our 7TB research artifacts showing which layers not to quantize - all provider's quants were not optimal, not broken - ssm_out and ssm_* tensors were the issue - we're now the best in terms of KLD and disk space - see https://www.reddit.com/r/LocalLLaMA/comments/1rgel19/new_qwe...

On other fixes, we also fixed bugs in many OSS models like Gemma 1, Gemma 3, Llama chat template fixes, Mistral, and many more.

It might seem these issues are due to us, but it's because we publicize them and tell people to update. 95% of them are not related to us, but as good open source stewards, we should update everyone.

Replies

evilduck • yesterday at 4:36 PM

I just wanted to express gratitude to you guys, you do great work. However, it is a little annoying to have to redownload big models though and keeping up with the AI news and community sentiment is a full time job. I wish there was some mechanism somewhere (on your site or Huggingface or something) for displaying feedback or confidence in a model being "ready for general use" before kicking off 100+ GB model downloads.

➕ show 2 replies

solomatov • yesterday at 10:58 PM

Just curious, the fixes are not about weights but about templates, am I right?

sowbug • yesterday at 3:57 PM

Please publish sha256sums of the merged GGUFs in the model descriptions. Otherwise it's hard to tell if the version we have is the latest.

➕ show 2 replies

magicalhippo • yesterday at 7:31 PM

Appreciate the work of your team very much.

Though chat templates seem like they need a better solution. So many issues, seems quite fragile.

dist-epoch • yesterday at 4:18 PM

What do you think about creating a tool which can just patch the template embedded in the .gguf file instead of forcing a re-download? The whole file hash can be checked afterwards.

➕ show 1 reply

alt Hacker News

Replies