logoalt Hacker News

lxgryesterday at 6:27 PM1 replyview on HN

Yes, but how good will the recall performance be? Just because your prompt fits into context doesn't mean that the model won't be overwhelmed by it.

When I last tried this with some Gemini models, they couldn't reliably identify specific scenes in a 50K word novel unless I trimmed down the context to a few thousands of words.

> Having LLMs use simpler tools like grep based on an array of similar search terms and then evaluating what comes up is faster in many cases

Sure, but then you're dependent on (you or the model) picking the right phrases to search for. With embeddings, you get much better search performance.


Replies

Aurornisyesterday at 6:34 PM

> Yes, but how good will the recall performance be? Just because your prompt fits into context doesn't mean that the model won't be overwhelmed by it.

With current models it's very good.

Anthropic used a needle-in-haystack example with The Great Gatsby to demonstrate the performance of their large context windows all the way back in 2023: https://www.anthropic.com/news/100k-context-windows

Everything has become even better in the nearly 3 years since then.

> Sure, but then you're dependent on (you or the model) picking the right phrases to search for. With embeddings, you get much better search performance.

How do are those embeddings generated?

You're dependent on the embedding model to generate embeddings the way you expect.

show 1 reply