But does that explain all of the various scrapers doing the same thing across the same set of sites?...

iamnothere • yesterday at 10:15 PM • 1 reply • view on HN

But does that explain all of the various scrapers doing the same thing across the same set of sites? And again, the sheer bandwidth and CPU time involved should eventually bother the bean counters.

I did think of a couple of possibilities:

- Someone has a software package or list of sites out there that people are using instead of building their own scrapers, so everyone hits the same targets with the same pattern.

- There are a bunch of companies chasing a (real or hoped for) “scraped data” market, perhaps overseas where overhead is lower, and there’s enough excess AI funding sloshing around that they able to scrape everything mindlessly for now. If this is the case then the problem should fix itself as funding gets tighter.

Replies

TurdF3rguson • today at 4:21 AM

My theory on this one is some serial wantrepreneur came up with a business plan of scraping the archive and feeding it into a LLM to identify some vague opportunity. Then they paid some Fiverr / Upwork kid in India $200 to get the data. The good news is this website and any other can mitigate these things by moving to Cloudflare and it's free.

alt Hacker News

Replies