logoalt Hacker News

kouteiheikatoday at 3:33 PM2 repliesview on HN

The model is natively quantized (i.e. it was trained that way in the first place, so this is not a post-training quantization which degrades performance).


Replies

knollimartoday at 5:44 PM

Isn't it not completely quantized? I thought there were some dense parts but most is int4?

theanonymousonetoday at 4:33 PM

But the huggingface link mentions BF16, F16, and I32?

show 2 replies