Using an LLM on data does not ingest that data into the training corpus. LLMs don’t “learn” from the information they operate on, contrary to what a lot of people assume.
None of the mainstream paid services ingest operating data into their training sets. You will find a lot of conspiracy theories claiming that companies are saying one thing but secretly stealing your data, of course.
Information about the way we interact with the data (RLHF) can be used to refine agent behaviour.
While this isn't used specifically for LLM training, it can involve aggregating insights from customer behaviour.
They are not directly ingesting the data into their trainning sets but they are in most cases collecting it and will be using it to train future models.
If they weren't, then why would enterprise level subscriptions include specific terms stating that they don't train on user provided data? There's no reason to believe that they don't, and if they don't now then there's no reason to believe that they won't later whenever it suits them.
Wrong, buddy.
Many of the top AI services use human feedback to continuously apply "reinforcement learning" after the initial deployment of a pre-trained model.
https://en.wikipedia.org/wiki/Reinforcement_learning_from_hu...
maybe prompts are enough to infer the rest ?
> LLMs don’t “learn” from the information they operate on, contrary to what a lot of people assume.
Nothing is really preventing this though. AI companies have already proven they will ignore copyright and any other legal nuisance so they can train models.
> You will find a lot of conspiracy theories claiming that companies are saying one thing but secretly stealing your data, of course.
It's not really a conspiracy when we have multiple examples of high profile companies doing exactly this. And it keeps happening. Granted I'm unaware of cases of this occuring currently with professional AI services but it's basic security 101 that you should never let anything even have the remote opportunity to ingest data unless you don't care about the data.
Companies have already shifting from not using customer data to giving them an option to opt out ex:
“How can I control whether my data is used for model training?
If you are logged into Copilot with a Microsoft Account or other third-party authentication, you can control whether your conversations are used for training the generative AI models used in Copilot. Opting out will exclude your past, present, and future conversations from being used for training these AI models, unless you choose to opt back in. If you opt out, that change will be reflected throughout our systems within 30 days.” https://support.microsoft.com/en-us/topic/privacy-faq-for-mi...
At this point suggesting it has never and will her happen is wildly optimistic.