logoalt Hacker News

fine_tune07/31/20253 repliesview on HN

RAG is taking a bunch of docs, chunking them it to text blocks of a certain length (how best todo this up for debate), creating a search API that takes query (like a google search) and compares it to the document chunks (very much how your describing). Take the returned chunks, ignore the score from vector search, feed those chunks into a re-ranker with the original query (this step is important vector search mostly sucks), filter those re-ranked for the top 1/2 results and then format a prompt like;

The user ask 'long query', we fetched some docs (see below), answer the query based on the docs (reference the docs if u feel like it)

Doc1.pdf - Chunk N Eat cheese

Doc2.pdf- Chunk Y Dont eat cheese

You then expose the search API as a "tool" for the LLM to call, slightly reformatting the prompt above into a multi turn convo, and suddenly you're in ze money.

But once your users are happy with those results they'll want something dumb like the latest football scores, then you need a web tool - and then it never ends.

To be fair though, its pretty powerful once you've got in place.


Replies

base69807/31/2025

Or you find your users search for id strings like k1231o to find ref docs and end up needing key word search and reranking.

Valk3_08/01/2025

Sorry for my lack of knowledge, but I've been wondering what if you ask a question to the RAG, where the answer to the question is not close in embedding space to the embedded question? Will that not limit the quality of the result? Or how does a RAG handle that? I guess maybe the multi-turn convo you mentioned helps in this regard?

The way I see RAG is it's basically some sort of semantic search, where the query needs to be similar to whatever you are searching for in the embedding space order to get good results.

show 1 reply
criddell07/31/2025

Is RAG how I would process my 20+ year old bug list for a piece of software I work on?

I've been thinking about this because it would be nice to have a fuzzier search.

show 2 replies