most websites, particularly those behind cloudflare, are very restrictive even to crawlers that obey robots. Proof: a ton of my time over the last year, and my crawlers very carefully obey robots.
It's hard to see how this isn't extorting folks by offering a working solution that, oh, cloudflare doesn't block. As long as you pay Cloudflare.
Perhaps I'm overly cynical, but I'd be quite surprised if cloudflare subjected their own headless browsing to the same rules the rest of the internet gets.
>most websites, particularly those behind cloudflare, are very restrictive even to crawlers that obey robots. Proof: a ton of my time over the last year, and my crawlers very carefully obey robots.
The docs are pretty equivocal though:
>If you use Cloudflare products that control or restrict bot traffic such as Bot Management, Web Application Firewall (WAF), or Turnstile, the same rules will apply to the Browser Rendering crawler.
It's not just robots.txt. Most (all?) restrictions that apply to outside bots apply to cloudflare's bot as well, at least that's what they're claiming. If they're being this explicit about it, I'm willing to give them the benefit of the doubt until there's evidence to the contrary, rather than being a cynic and assuming the worst.