All the vendors paraphrase user data, then use the paraphrased data for training. This is what their terms of service say.
They have significant experience in this. Microsoft software since the 2014, for the most part, is also paraphrased from other people's code they find laying around online.
Why would they want to train on random garbage proprietary emails?
If their models ever spit out obviously confidential information belonging to their paying customers they'll lose those paying customers to their competitors - and probably face significant legal costs as well.
Your random confidential corporate email really isn't that valuable for training. I'd argue it's more like toxic waste that should be avoided at all costs.
> Microsoft software since the 2014, for the most part, is also paraphrased from other people's code they find laying around online.
That was pretty funny and explains a lot.
I wish I could do more :(
Instead I always break things when I paraphrase code without the GeniusParaphrasingTool
> All the vendors paraphrase user data, then use the paraphrased data for training. This is what their terms of service say.
It depends. E.g. OpenAI says: "By default, we do not train on any inputs or outputs from our products for business users, including ChatGPT Team, ChatGPT Enterprise, and the API."[0]
[0] https://openai.com/policies/how-your-data-is-used-to-improve...