Agree to an extent. There are absolutely unknown unknowns. But I think you'd be surprised how m...

binarylogic • yesterday at 7:04 PM • 0 replies • view on HN

Agree to an extent. There are absolutely unknown unknowns. But I think you'd be surprised how much data is obviously waste. Not the grey area, just pure garbage: health checks, debug logs left in production, redundant attributes.

That's why we break waste down into categories: https://docs.usetero.com/data-quality/categories/overview

But we don't stop there. You can go deeper with reasoning to root out the more nuanced waste. It's hard, but it's possible. That's where things get interesting.

alt Hacker News