logoalt Hacker News

selcukayesterday at 9:40 AM3 repliesview on HN

If it's worthless to AI vendors, they won't include it in the training corpus, so third parties won't have access to it.


Replies

estearumyesterday at 10:55 AM

They're alluding to something more like espionage of just selling the interesting stuff you put in the text box.

show 1 reply
bandramiyesterday at 12:39 PM

The worry is direct exfiltration, not training

TZubiriyesterday at 6:06 PM

But it isn't worthless because the user is paying for that, and third parties are paying for that as well. Unless the input output is completely different, which it's not because you are human, and I bet you have a profession which other humans have, and many other qualities which you share with other humans.

In any case, relying on the chance that the LLM inference won't train on your data because of it's presumably low value is as good a strategy as crossing your fingers or venerating the god of rain. You should be relying on contractual clauses at least when including professional and client data.