It works, I've shipped this as a "local inference"/poor person's ollama for...

avaer • today at 3:33 AM • 3 replies • view on HN

It works, I've shipped this as a "local inference"/poor person's ollama for low-end llm tasks like search. The main win is that it's free and privacy preserving, and (mostly) transparent to users in that they don't have to do anything, which is great for giving non-technical users local inference without making them do scary native things.

But keep in mind the actual experience for users is not great; the model download is orders of magnitude greater than downloading the browser itself, and something that needs to happen before you get your first token back. That's unfixable until operating systems start reliably shipping their own prebaked models that an API like this could plug into.

Replies

Yokohiii • today at 3:42 AM

> That's unfixable until operating systems start reliably shipping their own prebaked models that an API like this could plug into.

Maybe the next big thing will be some software subscription premium offers with a bunch of 5090s as an extra.

paganel • today at 7:48 AM

> operating systems start reliably shipping their own prebaked models

Here's to hoping that that dystopia will never happen.

subhobroto • today at 3:48 AM

> It works, I've shipped this as a "local inference"/poor person's ollama for low-end llm tasks like search

fantastic!

> the model download is orders of magnitude greater than downloading the browser itself, and something that needs to happen before you get your first token back

sure but does this mean the model is lazily downloaded? that is, if I used this and I am the first time the model was called, the user would be waiting until the model was downloaded at that point?

that sounds like a horrible user experience - maybe chrome reduces the confusion by showing a download dialog status or similar?

also, any idea what the on disk impact is?

➕ show 4 replies

alt Hacker News

Replies