Cant you just look at the diffs? Not sure the point of using Claude and having to babysit every change it makes, kind of defeats the purpose. Like would you sit watching a Junior devs every keystroke.
Depends on the work you're doing. Cookie cutter / derivative work like I do for some hobby projects? Sure, it can near full auto it. More abstract or cutting edge stuff like in academic research enviornments? It needs correction at just about every step. Your workflow sounds like it deals with the former, which is fine, but that isn't everyone.
Nobody said every keystroke. That’s not like for like.
I don't sit there watching every session either—that's definitely not the point.
It's more like standard observability. You don't watch your server logs all day, but when an error spikes, you need deep tracing to find out why.
I use this when the agent gets stuck on a simple task or the context window fills up way faster than expected. The tool lets me "drill down" into the tool outputs and execution tree to see exactly where the bad loop started.
If you're running multiple parallel sessions across different terminal tabs, trying to grep through raw logs to find a specific failure is a massive productivity sink. This is for when things go sideways and you need to solve it in seconds, not for babysitting every keystroke.