logoalt Hacker News

augusteotoday at 2:18 AM1 replyview on HN

On the API vs local model question:

We went with API embeddings for a similar use case. The cold-start latency of local models across multiple workers ate more money in compute than just paying per-token. Plus you avoid the operational overhead of model updates.

The hybrid approach in this article is smart. Fuzzy matching catches 80% of cases instantly, embeddings handle the rest. No need to run expensive vector search on every query.


Replies

TurdF3rgusontoday at 2:36 AM

Those text embeddings are dirt cheap. You can do around 1M titles on the cloudflare embedding model I used last time without exceeding daily free tier.

show 1 reply