> This class of bug seems to be in the harness, not in the model itself. It’s somehow labelling i...

arkensaw • today at 11:28 AM • 0 replies • view on HN

> This class of bug seems to be in the harness, not in the model itself. It’s somehow labelling internal reasoning messages as coming from the user, which is why the model is so confident that “No, you said that.”

from the article.

I don't think the evidence supports this. It's not mislabelling things, it's fabricating things the user said. That's not part of reasoning.

alt Hacker News