logoalt Hacker News

seanwilson10/03/20240 repliesview on HN

Well, there's not a lot any crawler will be able to do if a website is gated with aggressive bot detection e.g. Puppeteer via a proxy will similar problems. Even if a bypass is found, it could break tomorrow. I've rarely had support messages about this, but most of them were resolved by adding IP addresses or user-agent/header strings to an allow list, or turning down how aggressive the bot detection is. Checkbot is more for crawling sites you have control over so there's more options here.

It is worrying what this means for the future for web crawlers in general though if most sites end up being gated to all bots that aren't from major search engines.