I wonder if a 1B model could be close to free to host. That's an eventuality, but I wonder how long it'll take for that to be real.
A 1B model at 2-bit quantization is about the size of the average web page anymore. With some WebGPU support you could run such a model in a browser.
I'm half joking. Web pages are ludicrously fat these days.
I’m planning to deploy a 1B model, feed it all the documents I’ve ever written, host it on a $149 mini-PC in my bedroom, and enable you to chat with it.
I’ve released similar projects before.
I’ll drop a post about my plans in the coming days and I’ll build and document it about two weeks later if there’s enough interest.
joeldare.com