Nice one.
I've been doing the same bit wider scope, for the whole Crux list, pruned to apex domains, and looking for CMS signals - how's your throughput?
I'm not doing any headless browser stuff, or many requests, so hyper optimised for speed.
I do grab robots.txt - didn't really see much in llms.txt or humans.txt in the wild, does yours?
[dead]
Ohh Cloudflare verified bot status, interesting I'll check that out.
I'm seeing about 6.6% block rate, but that does climb over time.