logoalt Hacker News

gruez01/21/20251 replyview on HN

>"We apply our de-anonymization methodology to the Netflix Prize dataset, which contains anonymous movie ratings of 500,000 subscribers of Netflix, the world’s largest online movie rental service. We demonstrate that an adversary who knows only a little bit about an individual subscriber can easily identify this subscriber’s record in the dataset."

> [...]

"Researchers at MIT and the Université catholique de Louvain, in Belgium, analyzed data on 1.5 million cellphone users in a small European country over a span of 15 months and found that just four points of reference, with fairly low spatial and temporal resolution, was enough to uniquely identify 95 percent of them." [...] "A few Twitter posts would probably provide all the information you needed, if they contained specific information about the person's whereabouts."

The only reason the two attacks work is that you have access to a bunch of uncorrelated data points. That is, ratings for various shows and their dates, and cellphone movement patterns. It's unclear how you could extend this to some guy you're trying to dox on signal. The geo info is relatively coarse and stays static, so trying to single out a single person is going to be difficult. To put another way, "guy was vaguely near New York on these dates" doesn't narrow down the search parameters by much. That's going to be true for millions of people.


Replies

ziddoap01/21/2025

>To put another way, "guy was vaguely near New York on these dates" doesn't narrow down the search parameters by much.

That's why I said that this data alone is probably worthless, but can gain value when combined with other data.("As a piece of data alone, the results are probably not of significant use")

The combining of data is the important bit and the entire emphasis of both of my other comments.

Two pieces of otherwise anonymous data can, when combined, lead to re-identification.

show 1 reply