I agree that the model can help troubleshoot and debug itself. I argue that the model has no acces...

pierrekin • yesterday at 7:20 PM • 3 replies • view on HN

I agree that the model can help troubleshoot and debug itself.

I argue that the model has no access to its thoughts at the time.

Split brain experiments notwithstanding I believe that I can remember what my faulty assumptions were when I did something.

If you ask a model “why did you do that” it is literally not the same “brain instance” anymore and it can only create reasons retroactively based on whatever context it recorded (chain of thought for example).

Replies

XenophileJKO • yesterday at 8:00 PM

Anthropic's introspection experiments have seemed to show that your argument is falsifiable.

https://www.anthropic.com/research/introspection

➕ show 1 reply

fragmede • yesterday at 7:52 PM

Claude code and codex both hide the Chain of Thought (CoT) but it's just words inside a set of <thinking> tags </thinking> and the agent within the same session has access to that plaintext.

➕ show 1 reply

jmalicki • yesterday at 7:27 PM

It does have access to its thoughts. This is literally what thinking models do. They write out thoughts to a scratch pad (which you can see!) and use that as part of the prompt.

➕ show 6 replies

alt Hacker News

Replies