logoalt Hacker News

pavel_lishintoday at 3:38 PM3 repliesview on HN

Anonymizing data is incredibly difficult to do: https://www.theguardian.com/technology/2014/jun/27/new-york-...

> New York City has released data of 173m individual taxi trips – but inadvertently made it "trivial" to find the personally identifiable information of every driver in the dataset.


Replies

afarah1today at 4:01 PM

Interesting read, thanks. The related article shows that even more robust anonymization techniques may still be insufficient (in the case of the taxi rides, spatial-temporal analysis could still lead to de-anonymization). More reason to reduce data collection. Unfortunately the trend is the opposite for governments all around the world.

the_sleaze_today at 5:03 PM

It's really not unless of course you are dis-incentivized to provide anonymous data. The ground is thick with prior art and existing solutions.

https://www.hhs.gov/hipaa/for-professionals/special-topics/d...

show 1 reply
wtallistoday at 4:10 PM

That example only demonstrates leaked information of the drivers, not the passengers/customers. And the "anonymized" driver and license data wouldn't need to be released in any form at all to produce a dataset useful for public transportation planning purposes: approximate time of day and approximate location are sufficient to estimate demand, and there's no need to keep track of who is making which trips.

show 1 reply