>The real-world application (and potential danger) is when this data is combined with other data. De-anonymization techniques using sparse datasets has been an active area of research for at least 15 years and it is often surprising to people how much can be gleaned from a few pieces of seemingly unconnected data.
Seems pretty handwavy. Can you describe concretely how this would work?
>Seems pretty handwavy.
It has a whole Wikipedia article and everything.
https://en.wikipedia.org/wiki/De-anonymization#Re-identifica...
>Can you describe concretely how this would work?
Here's one of the earlier papers I remember off-hand, demonstrating one methodology. New (and improvements to existing) statistical techniques have happened in the ~18 years since this was published. Not to mention their is significantly more data to work with now.
https://www.cs.utexas.edu/~shmat/shmat_oak08netflix.pdf
"We apply our de-anonymization methodology to the Netflix Prize dataset, which contains anonymous movie ratings of 500,000 subscribers of Netflix, the world’s largest online movie rental service. We demonstrate that an adversary who knows only a little bit about an individual subscriber can easily identify this subscriber’s record in the dataset."
From the Wiki I linked:
"Researchers at MIT and the Université catholique de Louvain, in Belgium, analyzed data on 1.5 million cellphone users in a small European country over a span of 15 months and found that just four points of reference, with fairly low spatial and temporal resolution, was enough to uniquely identify 95 percent of them." [...] "A few Twitter posts would probably provide all the information you needed, if they contained specific information about the person's whereabouts."
Point being that operational security is hard, and it takes a lot less to "slip up" and accidentally reveal yourself than most people think. Obtaining a location within 250 miles (or whatever) can be a key piece of information that leads to other dots being connected.
Other examples (albeit with less explanation) include police take downs of prolific CSAM producers by gathering bits and pieces of information over time, culminating in enough to make an identification.