The broader problem of original sources not being given credit in a way that rewards them remains. W...

dvduval • today at 2:09 PM • 9 replies • view on HN

The broader problem of original sources not being given credit in a way that rewards them remains. Websites owners are paying to host their content so that spiders can come and crawl them and index it into the AI and then if they’re lucky, they might get a citation, but otherwise there’s very little reward for being a provider of content. And of course, this is something that’s getting worse and worse. Why look at a website when it’s all in AI? And then the counter to that is maybe we need to start closing the website to crawlers and put everything behind a login.

Replies

Ensorceled • today at 2:18 PM

Worse, the constant AI scraping is actually costing content providers additional money for no return. At least Google/Bing/Yahoo scraping would then be used to provide links back to your content.

➕ show 2 replies

motbus3 • today at 2:19 PM

About a year ago OpenAI crawled and go DDOS level the company I work. Even despite the robots.txt not allowing it, and despite some recaptcha we could assemble in time.

We found our data in the outputs of their models but who can do anything about it...

➕ show 4 replies

b00ty4breakfast • today at 4:12 PM

>Why look at a website when it's all in AI?

well, at least in the case of google, I'm pretty sure that's the point. Or at least, they are doing things that would seem to be moving towards being an oracle with all the answers and not the signpost that points you in the right direction. The destination rather than the gateway.

➕ show 1 reply

spacechild1 • today at 2:41 PM

It's actually costing them money/time! A friend of mine is a sysadmin at a university and he constantly has to deal with AI crawler DDoS-ing his servers. He said Anthropic is actually one of the worst offenders.

These AI companies are really just a gross example of the motto "Socialize the costs, privatise the profits". It's disgusting!

aaarrm • today at 2:39 PM

Is it possible able to host your website in a way so that it couldn't be found via search engines (and thus wouldn't be crawlable I hope)?

I know this has repercussions on findability, but if that wasn't a concern, I'm curious how one might circumvent getting crawled.

➕ show 5 replies

wolttam • today at 2:21 PM

I’ve been thinking of a proof-of-work scheme for accessing content where you effectively need to mine some crypto for the author, but, this idea might not fly today

➕ show 3 replies

gabbagool • today at 3:43 PM

I agree with this whole heartedly. What's the point of even having copyright law at this point?

What's even crazier to think about is that to use the latest versions of these models for which you supplied training data, you have to pay hundreds of dollars a month. I would love to get a settlement check proportional to my model weights. Even if it's $0.10, at least everyone out there will get what they're owed.

➕ show 2 replies

WarmWash • today at 2:41 PM

[flagged]

➕ show 7 replies

internet2000 • today at 3:07 PM

Perhaps we should go back to back when the internet was about sharing information you liked, not about credit or making money on "content".

➕ show 1 reply

alt Hacker News

Replies