It's a two year old base model that's only 3B parameters, trained on only 100B toke...

naasking • today at 3:21 PM • 1 reply • view on HN

It's a two year old base model that's only 3B parameters, trained on only 100B tokens. It's still a research project at this point.

gardnr • today at 4:02 PM

The new model they just released has impressive benchmark results: https://huggingface.co/microsoft/bitnet-b1.58-2B-4T

Except on GSM8K and math...

➕ show 1 reply

alt Hacker News