logoalt Hacker News

K0balttoday at 10:29 AM1 replyview on HN

It absolutely can be pointed to any standard endpoint, either cloud or local.

It’s far better for most users to be able to specify an inference server (even on localhost in some cases) because the ecosystem of specialized inference servers and models is a constantly evolving target.

If you write this kind of software, you will not only be reinventing the wheel but also probably disadvantaging your users if you try to integrate your own inference engine instead of focusing on your agentic tooling. Ollama, vllm, hugging face, and others are devoting their focus to the servers, there is no reason to sacrifice the front end tooling effort to duplicate their work.

Besides that, most users will not be able to run the better models on their daily driver, and will have a separate machine for inference or be running inference in private or rented cloud, or even over public API.


Replies

backscratchestoday at 10:39 AM

It is not local first. Local is not the primary use case. The name is misleading to the point I almost didn't click because I do not run local models.

show 1 reply