logoalt Hacker News

delfinomyesterday at 11:05 PM2 repliesview on HN

As someone that runs the infrastructure for a large OSS project. Mostly Chinese AI firms. All the big name brand AI firms play reasonably nice and respect robots.txt.

The Chinese ones are hyper aggressive, with no rate limit and pure greed scraping. They'll scrape the same content hundreds of times the same day


Replies

suburban_striketoday at 1:28 AM

The Chinese are also sloppy. They will run those scrapers until they get banned and not give a fuck.

In my experience, they do not bother putting in the effort to obfuscate source or evade bans in the first place. They might try again later, but this particular setup was specifically engineered for resiliency.

show 1 reply
rfmozyesterday at 11:46 PM

Chinese AI is doing large amounts of request in the past weeks.

show 1 reply