logoalt Hacker News

simonwyesterday at 8:20 PM4 repliesview on HN

A few months ago I would have said that no, Anthropic make it very clear that they don't ever train on customer data - they even boasted about that in the Claude 3.5 Sonnet release back in 2024: https://www.anthropic.com/news/claude-3-5-sonnet

> One of the core constitutional principles that guides our AI model development is privacy. We do not train our generative models on user-submitted data unless a user gives us explicit permission to do so.

But they changed their policy a few months ago so now as-of October they are much more likely to train on your inputs unless you've explicitly opted out: https://www.anthropic.com/news/updates-to-our-consumer-terms

This sucks so much. Claude Code started nagging me for permission to train on my input the other day, and I said "no" but now I'm always going to be paranoid that I miss some opt-out somewhere and they start training on my input anyway.

And maybe that doesn't matter at all? But no AI lab has ever given me a convincing answer to the question "if I discuss company private strategy with your bot in January, how can you guarantee that a newly trained model that comes out in June won't answer questions about that to anyone who asks?"

I don't think that would happen, but I can't in good faith say to anyone else "that's not going to happen".

For any AI lab employees reading this: we need clarity! We need to know exactly what it means to "improve your products with your data" or whatever vague weasel-words the lawyers made you put in the terms of service.


Replies

usefulposteryesterday at 8:31 PM

This would make a great blogpost.

>I'm always going to be paranoid that I miss some opt-out somewhere

FYI, Anthropic's recent policy change used some insidious dark patterns to opt existing Claude Code users in to data sharing.

https://news.ycombinator.com/item?id=46553429

>whatever vague weasel-words the lawyers made you put in the terms of service

At any large firm, product and legal work in concert to achieve the goal (training data); they know what they can get away with.

show 1 reply
hephaes7usyesterday at 11:04 PM

Why do you even necessarily think that wouldn't happen?

As I understand it, we'd essentially be relying on something like an mp3 compression algorithm to fail to capture a particular, subtle transient -- the lossy nature itself is the only real protection.

I agree that it's vanishingly unlikely if one person includes a sensitive document in their context, but what if a company has a project context which includes the same document in 10,000 chats? Maybe then it's more much likely that whatever private memo could be captured in training...

show 1 reply
brushfootyesterday at 8:28 PM

To me this is the biggest threat that AI companies pose at the moment.

As everyone rushes to them for fear of falling behind, they're forking over their secrets. And these users are essentially depending on -- what? The AI companies' goodwill? The government's ability to regulate and audit them so they don't steal and repackage those secrets?

Fifty years ago, I might've shared that faith unwaveringly. Today, I have my doubts.

postalcoderyesterday at 8:29 PM

I despise the thumbs up and thumbs down buttons for the reason of “whoops I accidentally pressed this button and cannot undo it, looks like I just opted into my code being used for training data, retained for life, and having their employees read everything.”