Which paper? I've read all the titles and looked at a few from the past year, but it's not obvious which you're referring to.
I did also, accidentally, find some "I tried the obvious thing and the results challenge the paper's narrative" criticism of one of Anthropic's recent papers: https://www.greaterwrong.com/posts/kfgmHvxcTbav9gnxe/introsp.... So that's significantly reduced my overall trust in this research team's interpretation of their own results – specifically, their assertions of the form "there must exist". (Several people in the comments there claim to have designed their own experiments that replicate Anthropic's claims, but none of the ones I've looked at actually do: they have even more obvious flaws, like arXiv:2602.11358 being indistinguishable from "the prompt says to tell a first-person story about an AI system gaining sentience after being given a special prompt, and homonyms are represented differently within a model".)