logoalt Hacker News

willmaddentoday at 5:02 PM1 replyview on HN

I'm not sure why your comment is grayed out.

Cell tower data, credit bureau integration, social media scraping, palantir, smart home device surveillance, DNA database exploitation, facial recognition networks, tax, payroll, passport, visa, medicare/medicaid, immigrations and customs databases and many more...

The census is a historical relic used to jerrymander congressional seats, and that's about it.


Replies

everforwardtoday at 5:30 PM

Census data provides a reliable source to build off of, which makes joining between data sets more reliable. A lot of what you're talking about would be partial prints of an identity that have to be joined up with others to give reliable data.

Eg

> Cell tower data

That's just going to get you a subscriber and device ID, unless you're talking about going deep packet inspection and parsing the contents of the packets. You could, but that's a lot of effort to get something the census can hand you for free.

> credit bureau integration

Notoriously unreliable and identities for the purpose of credit get stolen constantly. The easiest way to clean that is against known-good info, like the census.

> social media scraping

Half the profiles are fake, also not reliable data unless you clean it up. Again, census data makes it very easy to cut out profiles that don't match a real person.

> tax, payroll

These are probably fairly reliable, although they usually won't tell you about a person's demographics.

> passport, visa, medicare/medicaid, immigrations and customs databases

There's an enormous part of the population that won't appear in these at all. The huge part of the country that's "working poor" but not poor enough for Medicaid probably aren't traveling internationally. I wouldn't be surprised if half the country doesn't appear in any of these.

The census has value in that it contains a huge depth of information, is tied with your identity, citizens are compelled by law to answer so even the privacy folks have to respond and lying on it is a crime (enforcement is probably non-existent, though).

I'm sure that can all be reconstructed to some level of accuracy given sufficient effort, but that's a lot harder and requires a ton more coordination than "SELECT * FROM census_data WHERE ..."

show 1 reply