logoalt Hacker News

saltysalttoday at 10:42 AM1 replyview on HN

I'll work on that adjustment, it's fair feedback thanks!


Replies

direwolf20today at 11:18 AM

Unfortunately this is the bulk of search engine work. Recursive scraping is easy in comparison, even with CAPTCHA bypassing. You either limit the index to only highly relevant sites (as Marginalia does) or you must work very hard to separate the spam from the ham. And spam in one search may be ham in another.

show 1 reply