There are a couple of WebGPU LLM platforms available that form the building blocks to accomplish this right from the browser, especially since the models are so small.
https://github.com/mlc-ai/web-llm
https://huggingface.co/docs/transformers.js/en/index
You do have to worry about WebGPU compatibility in browsers though.