Anonymizing data is incredibly difficult to do: https://www.theguardian.com/technology/2014/jun/27/new-york-...
> New York City has released data of 173m individual taxi trips – but inadvertently made it "trivial" to find the personally identifiable information of every driver in the dataset.
It's really not unless of course you are dis-incentivized to provide anonymous data. The ground is thick with prior art and existing solutions.
https://www.hhs.gov/hipaa/for-professionals/special-topics/d...
That example only demonstrates leaked information of the drivers, not the passengers/customers. And the "anonymized" driver and license data wouldn't need to be released in any form at all to produce a dataset useful for public transportation planning purposes: approximate time of day and approximate location are sufficient to estimate demand, and there's no need to keep track of who is making which trips.
Interesting read, thanks. The related article shows that even more robust anonymization techniques may still be insufficient (in the case of the taxi rides, spatial-temporal analysis could still lead to de-anonymization). More reason to reduce data collection. Unfortunately the trend is the opposite for governments all around the world.