logoalt Hacker News

stephen_gtoday at 1:47 AM2 repliesview on HN

LLMs have other ways of accessing the content, they don’t need the Web Archive.

Every LLM company can afford to spin up a new subscriber account every day, proxying to appear different IPs from all sorts of ASNs, do some crawling until the account gets banned, and then do it again, and again, and again.


Replies

overfeedtoday at 2:21 AM

> LLMs have other ways of accessing the content, they don’t need the Web Archive.

What's the conclusion from this train if thought? Just because some burglars can pick locks doesn't mean you should leave your front door unlocked.

Locking a door (or robots.txt) is how one can establish mens rea for those who bypass the barrier.

show 2 replies
Gigachadtoday at 2:46 AM

The legal implications would be different vs scraping publicly available content.

show 1 reply