That’s exactly where we’re headed. Architecturally it makes zero sense to spin up an LLM in every ap...

veunes • today at 6:41 AM • 0 replies • view on HN

That’s exactly where we’re headed. Architecturally it makes zero sense to spin up an LLM in every app's userspace. Since we have dedicated NPUs and GPUs now, we need a unified system-level orchestrator to balance inference queues across different programs - exactly how the OS handles access to the NIC or the audio stack. The browser should just be making an IPC call to the system instead of hauling its own heavy inference engine along for the ride

alt Hacker News