It might interest you to know that one or two months ago, I had Claude port BitNet to WebGPU from the reference implementation, so that it runs right in your browser as a local model. After some debugging, the port seemed to work, but the model didn't function as well as the reference implementation so I'll have to work on it for a while. You can see a debugging session livestreamed here[1]. The released model file was about a gigabyte, it fits in most people's GPU's. We were also able to successfully fine-tune it right in the browser.
There's a lot that you can do when the model size is that small, yet still powerful.
Our next step is that we want to put up a content distribution network for it where people can also share their diffs for their own fine-tuned model. I'll post the project if we finish all the parts.
[1] https://www.youtube.com/live/x791YvPIhFo?is=NfuDFTm9HjvA3nzN