logoalt Hacker News

arkensawtoday at 11:28 AM0 repliesview on HN

> This class of bug seems to be in the harness, not in the model itself. It’s somehow labelling internal reasoning messages as coming from the user, which is why the model is so confident that “No, you said that.”

from the article.

I don't think the evidence supports this. It's not mislabelling things, it's fabricating things the user said. That's not part of reasoning.