The real thing I think people are rediscovering with file system based search is that there’s a type...

softwaredoug • today at 5:41 PM • 7 replies • view on HN

The real thing I think people are rediscovering with file system based search is that there’s a type of semantic search that’s not embedding based retrieval. One that looks more like how a librarian organizes files into shelves based on the domain.

We’re rediscovering forms of in search we’ve known about for decades. And it turns out they’re more interpretable to agents.

https://softwaredoug.com/blog/2026/01/08/semantic-search-wit...

Replies

wielebny • today at 5:56 PM

Someone simply assumed at some point that RAG must be based on vector search, and everyone followed.

➕ show 2 replies

czhu12 • today at 6:56 PM

Similar effort with PageIndex [1], which basically creates a table of contents like tree. Then an LLM traverses the tree to figure out which chunks are relevant for the context in the prompt.

1: https://github.com/VectifyAI/PageIndex

khalic • today at 5:57 PM

This kind of circles back to ontological NLP, that was using knowledge representation as a primitive for language processing. There is _a ton_ of work in that direction.

➕ show 1 reply

postalcoder • today at 6:03 PM

Lovely blog post, first thing I've read in a while that feels like it was written by a human.

Why do you think about knowledge graphs for RAG?

skeptrune • today at 6:26 PM

I think it's cool that LLMs can effectively do this kind of categorization on the fly at relatively large scale. When you give the LLM tools beyond just "search", it really is effectively cheating.

UltraSane • today at 5:58 PM

Inverted indexes have the major advantages of supporting Boolean operators.

whattheheckheck • today at 5:55 PM

Turns out the millions of people in knowledge work arent librarians and they wing shit everywhere

alt Hacker News

Replies