logoalt Hacker News

FieryMechaniclast Friday at 9:40 AM2 repliesview on HN

The way most scrapers work (I've written plenty of them) is that you just basically get the page and all the links and just drill down.


Replies

conartist6last Friday at 12:12 PM

So the easiest strategy to hamper them if you know you're serving a page to an AI bot is simply to take all the hyperlinks off the page...?

That doesn't even sound all that bad if you happen to catch a human. You could even tell them pretty explicitly with a banner that they were browsing the site in no-links mode for AI bots. Put one link to an FAQ page in the banner since that at least is easily cached

show 1 reply
tigranbslast Friday at 9:44 AM

And obviously, you need things fast, so you parallelize a bunch!

show 1 reply