I’m not a CSAM-detection expert, but after my suspension I ended up doing a lot of research into how these systems work and where they fail. And one important point: Google isn’t just using PhotoDNA-style perceptual hashing.
They’re also running AI-based classifiers on Drive content, and that second layer is far more opaque and far more prone to false positives.
That’s how you get situations like mine: ~700 problematic images in a ~700k-image dataset triggered Google to delete 130,000+ completely unrelated files and shut down my entire developer ecosystem. Hash-matching is predictable.
AI classification is not. And Google’s hybrid pipeline: isn’t independently vetted isn’t externally audited isn’t reproducible
has no recourse when it’s wrong
In practice, it’s a black box that can erase an innocent researcher or indie dev overnight. I wrote about this after experiencing it firsthand — how poisoned datasets + opaque AI detection create “weaponized false positives”: https://medium.com/@russoatlarge_93541/weaponized-false-posi...
I agree with the point above: if open, developer-accessible perceptual hashing tools existed — even via bloom filters or ZK membership proofs — this entire class of collateral damage wouldn’t happen.
Instead, Big Tech keeps the detection tools proprietary while outsourcing the liability to everyone else. If their systems are wrong, we pay the cost — not them.