logoalt Hacker News

overfeedyesterday at 10:49 PM1 replyview on HN

> If I run a big data hungry AI lab consuming training data at 100Gb/s it's much much easier to...

You are incorrectly assuming competency, thoughtful engineering and/or some modicum of care for negative externalities. The scraper may have been whipped up by AI, and shipped an hour later after a quick 15-minute test against en.wikipedia.org.

Whoever the perpetrator is, they are hiding behind "residential IP providers" so there's no reputational risks. Further, AI companies already have a reputation for engaging in distasteful practices, but popular wisdom claims that they make up for the awfulness with utility, so even if it turns out to be a big org like OpenAI or Anthropic, people will shrug their shoulders and move on.


Replies

fancyfredbotyesterday at 11:37 PM

Yes I agree it's more likely incompetence than malice. That's another reason I don't think it's a lab. Even if you don't like the big labs you can probably admit they are reasonably smart/competent.

Residential IP providers definitely don't remove reputational risk. There are many ways people can find out what you are doing. The main one being that your employees might decide to tell on you.

The IP providers are a great way of getting around cloud flare etc. They are also reasonably expensive! I find it very plausible that these IP providers are involved but I still don't understand who is paying them.

show 1 reply