logoalt Hacker News

saltysalttoday at 10:10 AM8 repliesview on HN

I built my own web search index on bare metal, index now up to 34m docs: https://greppr.org/

People rely too much on other people's infra and services, which can be decommissioned anytime. The Google Graveyard is real.


Replies

orftoday at 10:14 AM

Number of docs isn’t the limiting factor.

I just searched for “stackoverflow” and the first result was this: https://www.perl.com/tags/stackoverflow/

The actual Stackoverflow site was ranked way down, below some weird twitter accounts.

show 1 reply
1718627440today at 12:32 PM

The input on the results page doesn't work, you always need to return to the start page on which the browser history is disabled. That's just confusing behaviour.

show 1 reply
jfindleytoday at 12:14 PM

Unfortunately the index is the easy part. Transforming user input into a series of tokens which get used to rank possible matches and return the top N, based on likely relevence, is the hard part and I'm afraid this doesn't appear to do an acceptable job with any of the queries I tested.

There's a reason Google became so popular as quickly as it did. It's even harder to compete in this space nowadays, as the volume of junk and SEO spam is many orders of magnitude worse as a percentage of the corpus than it was back then.

show 1 reply
tostitoday at 12:33 PM

This is pretty cool. Don't let the naysayers stop you. Taking a stab at beating Google at their core product is bravery in my book. The best of luck to you!

show 1 reply
1718627440today at 12:41 PM

You should consider filtering by input language. Showing the same Wikipedia article in different languages is not helpful when I am searching in English. Also you may unify by entries by URL, it shows the same URL, just with different publish dates, which is interesting and might be useful, but should maybe be behind a toggle, as it is confusing at first.

show 1 reply
renegat0x0today at 10:15 AM

I made also something for my own search needs. It's just an SQLite table of domains, and places. I have your search engine there also ;-)

https://github.com/rumca-js/Internet-Places-Database

Demo for most important ones https://rumca-js.github.io/search

show 1 reply
johnoftheseatoday at 10:38 AM

I tested it using a local keyword, as I normally do, and it took me to a Wikipedia page I didn’t know existed. So thanks for that.

show 1 reply
lolivetoday at 11:35 AM

Lol, a GooglePlus URL was mentionned on a webpage i browsed this week.#blastFromThePast

show 1 reply