> When you click it, the locally running LLM gets a copy of the web site in the context window, and you get to ask it a prompt, e.g. "summarize this".
I'm also now imagining my GPU whirring into life and the accompanying sound of a jetplane getting ready for takeoff, as my battery suddenly starts draining visibly.
Local LLMs for are a pipe dream, the technology fundamentally requires far too much computation for any true intelligence to ever make sense with current computing technologies.
That's the point. For things like summarizing a webpage or letting the user ask questions about it, not that much computation is required.
An 8B Ollama model installed on a middle of the road MacBook can do this effortlessly today without whirring. In several years, it will probably be all laptops.
Most laptops are now shipping with a NPU for handling these tasks. So it wont be getting computed on your GPU.