logoalt Hacker News

softwaredougtoday at 5:41 PM7 repliesview on HN

The real thing I think people are rediscovering with file system based search is that there’s a type of semantic search that’s not embedding based retrieval. One that looks more like how a librarian organizes files into shelves based on the domain.

We’re rediscovering forms of in search we’ve known about for decades. And it turns out they’re more interpretable to agents.

https://softwaredoug.com/blog/2026/01/08/semantic-search-wit...


Replies

wielebnytoday at 5:56 PM

Someone simply assumed at some point that RAG must be based on vector search, and everyone followed.

show 2 replies
czhu12today at 6:56 PM

Similar effort with PageIndex [1], which basically creates a table of contents like tree. Then an LLM traverses the tree to figure out which chunks are relevant for the context in the prompt.

1: https://github.com/VectifyAI/PageIndex

khalictoday at 5:57 PM

This kind of circles back to ontological NLP, that was using knowledge representation as a primitive for language processing. There is _a ton_ of work in that direction.

show 1 reply
postalcodertoday at 6:03 PM

Lovely blog post, first thing I've read in a while that feels like it was written by a human.

Why do you think about knowledge graphs for RAG?

skeptrunetoday at 6:26 PM

I think it's cool that LLMs can effectively do this kind of categorization on the fly at relatively large scale. When you give the LLM tools beyond just "search", it really is effectively cheating.

UltraSanetoday at 5:58 PM

Inverted indexes have the major advantages of supporting Boolean operators.

whattheheckhecktoday at 5:55 PM

Turns out the millions of people in knowledge work arent librarians and they wing shit everywhere