logoalt Hacker News

GCUMstlyHarmlsyesterday at 12:41 AM1 replyview on HN

I wonder how much wiggle there is for collect now (to provide service, context history, etc), then later anonymise (some how, to some level) and then train on it?

Also I wonder if the ToS covers "queries & interaction" vs "uploaded data" - I could imagine some tricky language in there that says we wont use your word document, but we may at some time use the queries you put against it, not as raw corpus but as a second layer examining what tools/workflows to expand/exploit.


Replies

danielheathyesterday at 2:31 AM

“We don’t train on your data” doesn’t exclude metadata, training on derived datasets via some anonymisation process, etc.

There’s a range of ways to lie by omission, here, and the major players have established a reputation for being willing to take an expansive view of their legal rights.