But the huggingface link mentions BF16, F16, and I32?
Not every weight is quantized. For example, those weights which don't take much space or are highly important are left in higher precision. State-of-art quantization of weights is never done uniformly (i.e. to all weights and in the same way).
I don't believe safetensors has a native int4 dtype, so they packed 4 int4s into a bf16 in this checkpoint.
Not every weight is quantized. For example, those weights which don't take much space or are highly important are left in higher precision. State-of-art quantization of weights is never done uniformly (i.e. to all weights and in the same way).