As someone that runs the infrastructure for a large OSS project. Mostly Chinese AI firms. All the big name brand AI firms play reasonably nice and respect robots.txt.
The Chinese ones are hyper aggressive, with no rate limit and pure greed scraping. They'll scrape the same content hundreds of times the same day
Chinese AI is doing large amounts of request in the past weeks.
The Chinese are also sloppy. They will run those scrapers until they get banned and not give a fuck.
In my experience, they do not bother putting in the effort to obfuscate source or evade bans in the first place. They might try again later, but this particular setup was specifically engineered for resiliency.