logoalt Hacker News

minimaxirtoday at 5:14 PM2 repliesview on HN

It's a bit awkward to release Gemma 4 12B (https://news.ycombinator.com/item?id=48385906), and then a canonical Q4_0 Gemma 4 12B a couple days later.

It's good that this post lists the expected VRAM usage for the models with Q4_0 Gemma 4 12B being 6.7GB, which will indeed fit Google's claims of fitting within 16GB comfortably, altough it confirms that only the quantized version will do so.

Relatedly, in Google's newly released Edge Gallery for macOS, Gemma 4 12B is explicitly listed as unsupported due to not enough RAM even on a 16GB machine, but given the expected VRAM usage here the Q4_0 variant definitely should fit and Google should fix that.


Replies

Aurornistoday at 5:32 PM

I'm not sure why you think it's awkward to have multiple releases. It's better to release models and variations as they're ready, not withhold them all until everything is ready to release all at once.

The Q4_0 is a quantization aware training checkpoint. It's not a simple quantization of the original Gemma 4 12B.

show 1 reply
netdurtoday at 5:20 PM

not sure if I understand you, but 4Q and QAT 4Q are different

show 1 reply