On the API vs local model question: We went with API embeddings for a similar use case. The cold-s...

augusteo • today at 2:18 AM • 1 reply • view on HN

On the API vs local model question:

We went with API embeddings for a similar use case. The cold-start latency of local models across multiple workers ate more money in compute than just paying per-token. Plus you avoid the operational overhead of model updates.

The hybrid approach in this article is smart. Fuzzy matching catches 80% of cases instantly, embeddings handle the rest. No need to run expensive vector search on every query.

Replies

TurdF3rguson • today at 2:36 AM

Those text embeddings are dirt cheap. You can do around 1M titles on the cloudflare embedding model I used last time without exceeding daily free tier.

➕ show 1 reply

alt Hacker News

Replies