and the repo for this project: https://github.com/microsoft/BitNet
The demo they showed was full of repeated sentences. The 3B model looks quite dense, TBH. Did they just want to show the speed?
The demo they showed was full of repeated sentences. The 3B model looks quite dense, TBH. Did they just want to show the speed?