I agree that the model can help troubleshoot and debug itself.
I argue that the model has no access to its thoughts at the time.
Split brain experiments notwithstanding I believe that I can remember what my faulty assumptions were when I did something.
If you ask a model “why did you do that” it is literally not the same “brain instance” anymore and it can only create reasons retroactively based on whatever context it recorded (chain of thought for example).
Claude code and codex both hide the Chain of Thought (CoT) but it's just words inside a set of <thinking> tags </thinking> and the agent within the same session has access to that plaintext.
It does have access to its thoughts. This is literally what thinking models do. They write out thoughts to a scratch pad (which you can see!) and use that as part of the prompt.
Anthropic's introspection experiments have seemed to show that your argument is falsifiable.
https://www.anthropic.com/research/introspection