logoalt Hacker News

NitpickLawyertoday at 6:00 AM0 repliesview on HN

There's a lengthy discussion to be had here, and there's enough lawyerspeak in every provider's data retention policy to wiggle out of anything. A few notes from their current data use page:

> If you enable “Privacy Mode” in Cursor’s settings: zero data retention will be enabled for our model providers. Cursor may store some code data to provide extra features. None of your code will ever be trained on by us or any third-party.

Note the "may store some code data" and "none of your code will ever be trained on". In general you never want to include actual customer code in training the data, because of leaks that you may not want. Say someone has a hash somewhere, and your model autocompletes that hash. Bad. But that's not to say you couldn't train a reward model on pairs of prompts + completions. You have "some code data" (which could be acceptance rate) and use that. You just need to store the acceptance rate. And later, when you train new models, you check against that reward model. Does my new model reply close enough to score higher? If so, you're going in the right direction.

> If you choose to turn off “Privacy Mode”: we may use and store codebase data, prompts, editor actions, code snippets, and other code data and actions to improve our AI features and train our models.

Self explainatory.

> Even if you use your API key, your requests will still go through our backend!

They are collecting data even if you BYOK.

> If you choose to index your codebase, Cursor will upload your codebase in small chunks to our server to compute embeddings, but all plaintext code for computing embeddings ceases to exist after the life of the request. The embeddings and metadata about your codebase (hashes, file names) may be stored in our database.

They don't store (nor need to store) plain text, but they may store embeddings and metadata. Again, you can use those to train other things, not necessarily models. You can use metadata to check if you're going in the right direction.