logoalt Hacker News

famouswafflesyesterday at 8:11 PM1 replyview on HN

>What research shows that you can ask ChatGPT to explain its reasoning and why it said what it said, and that's guaranteed to actually be the motivation?

What research shows that you can ask a Human to explain its reasoning and why it said what it said, and that's guaranteed to actually be the motivation? Because there's no such thing. If anything, what research exists suggests any explanation we're making is a nice post-hoc rationalization after the fact even if the Human thinks otherwise.

https://transformer-circuits.pub/2025/introspection/index.ht...


Replies

embedding-shapeyesterday at 9:13 PM

Why not try to answer my question, instead of asking a different question which I haven't even claimed to have the answer to?