To be clear, we are making a clear concession here that the people weren't truly anonymous. But we did use an LLM to remove any identifying information from HN making them quasi-anonymous, this is more described in the appendix Table 2.
We do also make a more real world like test in section 2. There we use the anthropic interviewer dataset which Anthropic redacted, from the redacted interviews our agent identified 9/125 people based on clues.
The blog post might be more approachable for a quick take: https://simonlermen.substack.com/p/large-scale-online-deanon...
But you also relied on people giving away too much personal information about themselves... which won't always be the case.
Thanks for that link! I'll put in the top text.
Edit: actually I've re-upped your submission of that link and moved the links to the paper to the toptext instead. Hopefully this will ground the discussion more in the actual study.