> gemma (unsloth/gemma-4-26B-A4B-it-GGUF) models Since you're running quantized (at ...

kpw94 • today at 6:53 PM • 1 reply • view on HN

> gemma (unsloth/gemma-4-26B-A4B-it-GGUF) models

Since you're running quantized (at UD-Q4_K_XL) , check out the "qat" models (unsloth/gemma-4-26B-A4B-it-qat-GGUF) !

- https://huggingface.co/unsloth/gemma-4-26B-A4B-it-qat-GGUF (With "Jun 9 Update: Added MTP support.")

- https://blog.google/innovation-and-ai/technology/developers-...

Replies

TIL:

> Quantization-Aware Training (QAT) [...] allows preserving similar quality to bfloat16 while dramatically reducing the memory requirements to load the model

alt Hacker News

Replies