logoalt Hacker News

roywigginslast Friday at 3:34 PM0 repliesview on HN

> In much the same way as any human with modest experience in adding two digit numbers doesn't necessarily sit there and do the full elementary school addition algorithm but jumps to the correct answer in fewer steps by virtue of just having a very trained neural net.

Right, which is strictly worse than humans are at reporting how they solve these sorts of problems. Humans can tell you whether they did the elementary school addition algorithm or not. It seems like Claude actually doesn't know, in the same way humans don't really know how they can balance on two legs, it's just too baked into the structure of their cognition to be able to introspect it. But stuff like "adding two-digit numbers" is usually straightforwardly introspectable for humans, even if it's just "oh, I just vibed it" vs "I mentally added the digits and carried the one"- humans can mostly report which it was.

Here's Anthropic's research:

https://www.anthropic.com/research/tracing-thoughts-language...