logoalt Hacker News

marginalia_nutoday at 10:23 AM1 replyview on HN

Yeah that's where I started out in 2021. Been at it for almost 5 years now, last three of which full time. I'm indexing about 1.1 billion documents now off a single server.

Hard part is doing it at any sort of scale and producing useful results. It's easy to build something that indexes a few million documents. Pushing into billions is a bigger challenge, as you start needing a lot of increasingly intricate bespoke solutions.

Devlog here:

https://www.marginalia.nu/tags/search-engine/

And search engine itself:

https://marginalia-search.com/

(... though it operates a bit sub-optimally now as I'm using a ton of CPU cores to migrate the index to use postings lists compression, will take about 4-5 days I think).


Replies

rickettetoday at 10:36 AM

Curious on what (how much) hardware your running this.

show 1 reply