logoalt Hacker News

antonvstoday at 9:30 AM0 repliesview on HN

> at such a scale, the main problem is to generate the embeddings for your items

Generation is often decoupled from querying, though. Consider LLMs, where training is a very expensive, slow, hardware intensive process, whereas inference is much faster and much less intensive.

But the performance of inference is in many ways more important than the performance of training, because inference is what users interact with directly.