A parameter can be any size float. Lots of downloadable models are FP8 (8 bits per parameter), but it appears this model is FP16 (16 bits per parameter)
Often, the training is done in FP16 then quantized down to FP8 or FP4 for distribution.
I think they are bfloat16, not FP16, but they are both 16bpw formats, so it doesn't make a size difference.
I think they are bfloat16, not FP16, but they are both 16bpw formats, so it doesn't make a size difference.