logoalt Hacker News

zomiaenyesterday at 9:47 PM1 replyview on HN

How many of these scrapers are written by AI by data-science folks who don't remotely care how often they're hitting the sites, and is data they wouldn't even think to give or ask the LLM about?


Replies

iamnothereyesterday at 10:15 PM

But does that explain all of the various scrapers doing the same thing across the same set of sites? And again, the sheer bandwidth and CPU time involved should eventually bother the bean counters.

I did think of a couple of possibilities:

- Someone has a software package or list of sites out there that people are using instead of building their own scrapers, so everyone hits the same targets with the same pattern.

- There are a bunch of companies chasing a (real or hoped for) “scraped data” market, perhaps overseas where overhead is lower, and there’s enough excess AI funding sloshing around that they able to scrape everything mindlessly for now. If this is the case then the problem should fix itself as funding gets tighter.

show 1 reply