On the API vs local model question:
We went with API embeddings for a similar use case. The cold-start latency of local models across multiple workers ate more money in compute than just paying per-token. Plus you avoid the operational overhead of model updates.
The hybrid approach in this article is smart. Fuzzy matching catches 80% of cases instantly, embeddings handle the rest. No need to run expensive vector search on every query.
Those text embeddings are dirt cheap. You can do around 1M titles on the cloudflare embedding model I used last time without exceeding daily free tier.