The issue here is not whether Anthropic used Common Crawl, Alibaba also does that. The issue is th...

breppp • today at 11:04 AM • 7 replies • view on HN

The issue here is not whether Anthropic used Common Crawl, Alibaba also does that.

The issue is that by distilling Claude, Alibaba reuses the IP anthropic used to train the model that's more akin to historical Chinese reverse engineering methods and disrespect of IP

Replies

snovv_crash • today at 11:34 AM

Alibaba paid for that data though, right? They didn't hack Anthropic, they bought accounts and ran them normally.

Also, you can't copyright AI outputs. So worst case they violated the ToS.

wongarsu • today at 3:53 PM

If using Common Crawl or Anna's Archive in your training data is legal, then surely the same is true for using conversations with Claude. I don't see a reasonable framework where training AI on copyrighted data is ok if and only if that data is not generated by AI

(granted, only meta got caught using Anna's Archive, but it seems safe to assume it's common practice. And even if it wasn't, the websites in Common Crawl are still covered by copyright)

causal • today at 4:38 PM

I wish people would stop using Anthropics incorrect use of the term distill. They don’t share logits so you can’t distill. You can generate training data, which doesn’t sound nearly so scary.

blackoil • today at 11:26 AM

'Issue' for who?

vrganj • today at 11:06 AM

Anthropic clearly doesn't respect other people's IP, it's real rich that they now insist on theirs being worthy of protection.

Fwiw, I think the concept of IP in general is counter to human progress.

➕ show 3 replies

matheusmoreira • today at 11:22 AM

> reuses the IP anthropic used to train the model

> disrespect of IP

Nobody other than Anthropic cares.

messe • today at 11:10 AM

> Alibaba reuses the IP anthropic used to train the model that's more akin to historical Chinese reverse engineering methods and disrespect of IP

Why is this any worse than Anthropic's disrepect of IP? You've apparently drawn a distinction between the two here, but I'm failing to see what it actually is.

➕ show 1 reply

alt Hacker News

Replies