Claude doesn't know why it acted the way it acted, it is only predicting why it acted. I...

phpnode • yesterday at 4:48 PM • 4 replies • view on HN

Claude doesn't know why it acted the way it acted, it is only predicting why it acted. I see people falling for this trap all the time

Replies

nnevatie • today at 1:30 AM

That's because when the failure becomes the context, it can clearly express the intent of not falling for it again. However, when the original problem is the context, none of this obviousness applies.

Very typical, and gives LLMs the annoying Captain Hindsight -like behaviour.

kaffekaka • yesterday at 4:54 PM

Yes, this pitfall is a hard one. It is very easy to interpret the LLM in a way there is no real ground for.

➕ show 1 reply

LoganDark • yesterday at 4:57 PM

It's not even predicting why it acted, it's predicting an explanation of why it acted, which is even worse since there's no consistent mental model.

nonethewiser • yesterday at 5:01 PM

IDK how far AIs are from intelligence, but they are close enough that there is no room for anthropomorphizing them. When they are anthropomorphized its assumed to be a misunderstanding of how they work.

Whereas someone might say "geeze my computer really hates me today" if it's slow to start, and we wouldn't feel the need to explain the computer cannot actually feel hatred. We understand the analogy.

I mean your distinction is totally valid and I dont blame you for observing it because I think there is a huge misunderstanding. But when I have the same thought, it often occurs to me that people aren't necessarily speaking literally.

➕ show 1 reply

alt Hacker News

Replies