It's a two year old base model that's only 3B parameters, trained on only 100B tokens. It's still a research project at this point.
The new model they just released has impressive benchmark results: https://huggingface.co/microsoft/bitnet-b1.58-2B-4T
Except on GSM8K and math...
The new model they just released has impressive benchmark results: https://huggingface.co/microsoft/bitnet-b1.58-2B-4T
Except on GSM8K and math...