As someone that runs the infrastructure for a large OSS project. Mostly Chinese AI firms. All the bi...

delfinom • yesterday at 11:05 PM • 2 replies • view on HN

As someone that runs the infrastructure for a large OSS project. Mostly Chinese AI firms. All the big name brand AI firms play reasonably nice and respect robots.txt.

The Chinese ones are hyper aggressive, with no rate limit and pure greed scraping. They'll scrape the same content hundreds of times the same day

Replies

suburban_strike • today at 1:28 AM

The Chinese are also sloppy. They will run those scrapers until they get banned and not give a fuck.

In my experience, they do not bother putting in the effort to obfuscate source or evade bans in the first place. They might try again later, but this particular setup was specifically engineered for resiliency.

➕ show 1 reply

rfmoz • yesterday at 11:46 PM

Chinese AI is doing large amounts of request in the past weeks.

➕ show 1 reply

alt Hacker News

Replies