logoalt Hacker News

Johnbottoday at 2:58 PM9 repliesview on HN

A lot of geolocation data on the market is anonymized, following medium-lived unique IDs that aren't able to be mapped to other identifiers. The problem with that is that if you have precise locations, or enough samples that you can apply statistics to find precise locations, in many cases you can de-anonymize the IDs. You can purchase address and resident listings from a number of different data vendors, and by checking where the device returns to at night you can figure its home address. Then if you find information on the residents (work locations, schools, etc.), you see if said device goes where each resident of the home address is likely to go, and you now have a pretty good idea of exactly who the device belongs to.


Replies

rockskontoday at 3:31 PM

There is no such thing as anonymized location data when you have the location of something where and when they sleep and work.

It's a rhetorical fiction the ad industry tells itself.

show 5 replies
ramoztoday at 7:07 PM

From what I've seen none of this is that complex, one could simply 'draw a circle around your house' and get all the "anonymized" device pings and just trace those.

terafloptoday at 4:17 PM

We should have learned this lesson 20 years ago when researchers were able to deanonymize a lot of the Netflix Prize dataset, which contained nothing except movie ratings and their associated dates.

https://arxiv.org/abs/cs/0610105

If movie ratings are vulnerable to pattern-matching from noisy external sources, then it should be obvious that location data is enormously more vulnerable.

vovanidzetoday at 3:37 PM

exactly. calling it 'anonymized' is pure security theater once you have enough data points to map out someones daily routine.

waiting for legislation or eulas to fix this is a lost cause since adtech always finds a loophole. the fix has to be architectural. moving toward stateless proxies that strip device identifiers at the edge before they even hit upstream servers. if the payload never touches a persistent db there is literally nothing to de-anonymize. stateless infra is the only sane way forward

show 2 replies
srousseytoday at 3:29 PM

Companies exist that de-anonymize other data brokers data. Lets the other data brokers claim they have anonymized data while end end users get everything.

show 1 reply
ninalanyontoday at 4:10 PM

In what sense can the latitude and longitude of my house be called anonymous data?

show 1 reply
jandrewrogerstoday at 3:57 PM

Location and identity are inextricably linked. You can't destroy identity without also destroying location and location is critical for myriad purposes.

The analytic reconstruction of identity from location is far more sophisticated than the scenarios people imagine. You don't need to know where they live to figure out who they are. Every human leaves a fingerprint in space-time.

show 2 replies
1121redblackgotoday at 3:21 PM

Yep. With side channel/one order of thinking above the laws, its trivial to get around said laws. Need better laws.

malfisttoday at 3:48 PM

> A lot of geolocation data on the market is anonymized

A lot isn't good enough.