what stops Kagi from indexing internet and makes them pay some guys to scrape search results from Go...

throwaway290 • today at 12:29 PM • 2 replies • view on HN

what stops Kagi from indexing internet and makes them pay some guys to scrape search results from Google? one guy at Marginalia can do it and entire dev team at a PAID search engine can't?

Replies

mrweasel • today at 3:26 PM

I don't know about others, but we have special rules for Google, Bing, and a few others, rate-limiting them less than some random bot.

The problem is scrapers (mostly AI scrapers from what we can tell). They will pound a site into the ground and not care and they are becoming increasingly good at hiding their tracks. The only reasonable way to deal with them is to rate-limit every IP by default and then lifting some of those restrictions on known, well behaving bots. Now we will lift those restrictions if asked, and frequently look at statistics to lift the restrictions from search engines we might have missed, but it's an up hill battle if you're new and unknown.

DangitBobby • today at 2:36 PM

As we've seen here on HN on the AI boom, it's not wonderful when a bunch of companies all use bots to scrape the entire web. Many sites only allow Google scrapers in robots.txt and the public will fight you hard if you scrape them without permission. It's just one of those things where it would be better for everyone if search engines could pay for access to the work that's done only once.

➕ show 1 reply

alt Hacker News

Replies