logoalt Hacker News

coppsilgoldtoday at 12:24 AM5 repliesview on HN

While useful it needs a big red warning to potential leakers. If they were personally served documents (such as via email, while logged in, etc) there really isn't much that can be done to ascertain the safety of leaking it. It's not even safe if there are two or more leakers and they "compare notes" to try and "clean" something for release.

https://en.wikipedia.org/wiki/Traitor_tracing#Watermarking

https://arxiv.org/abs/1111.3597

The watermark can even be contained in the wording itself (multiple versions of sentences, word choice etc stores the entropy). The only moderately safe thing to leak would be a pure text full paraphrasing of the material. But that wouldn't inspire much trust as a source.


Replies

crazygringotoday at 12:35 AM

This doesn't seem to be designed for leakers, i.e. people sending PDF's -- it's specifically for people receiving untrusted files, i.e. journalists.

And specifically about them not being hacked by malicious code. I'm not seeing anything that suggests it's about trying to remove traces of a file's origin.

I don't see why it would need a warning for something it's not designed for at all.

show 1 reply
apyrgiotistoday at 9:27 AM

Oof, that's a great point. We briefly touched on this a few weeks ago, but from the angle of canary tokens / tracking pixels [1].

Security-wise, our main concern is protecting people who read suspicious documents, such as journalists and activists, but we do have sources/leakers in our threat model as well. Our docs are lacking in this regard, but we will update them with information targeted specifically to non-technical sources/leakers about the following threats:

- Metadata (simple/deep)

- Redactions (surprisingly easy to get wrong)

- Physical watermarking (e.g., printer tracking dots)

- Digital watermarking (what you're pointing out here)

- Fingerprinting (camera, audio, stylometry)

- Canary tokens (not metadata per se, but still a de-anonymization vector)

If you come in FOSDEM next week, we plan to talk about this subject there [2].

The goal here isn't to provide a false sense of security, nor frighten people. It's plain old harm reduction. We know (and encourage) sources to share documents that can help get a story out, but we also want to educate them about the circumstances in which they may contain their PII, so that they can make an informed choice.

[1]: https://social.freedom.press/@dangerzone/115859839710582670

[2]: https://fosdem.org/2026/schedule/event/JZ3F8W-dangerzone_ble...

(Dangerzone dev btw)

alphazardtoday at 12:29 AM

I seem to remember Yahoo finance (I think it was them, maybe someone else) introducing benign errors into their market data feeds, to prevent scraping. This lead to people doing 3 requests instead of just 1, to correct the errors, which was very expensive for them, so they turned it off.

I don't think watermarking is a winning game for the watermarker, with enough copies any errors can be cancelled.

show 1 reply
normie3000today at 3:51 AM

> The only moderately safe thing to leak would be a pure text full paraphrasing of the material. But that wouldn't inspire much trust as a source.

Isn't this what newspapers do?

robertktoday at 2:31 AM

Why not leak a dataset of N full text paraphrasings of the material, together with a zero-knowledge proof of how to take one of the paraphrasings and specifically "adjust" it to the real document (revealed in private to trusted asking parties)? Then the leaker can prove they released "at least the one true leak" without incriminating themselves. There is a cryptographic solution to this issue.