I think more salient here (at term certainly) is setting up adversarial agents for testing/verification - that has been a big win for me in multi-agent workflows - when claude first released "computer use" that was a very big step in closing this loop and avoiding the manual babysitting involved in larger projects. PSA that it's not a silver bullet as the "analyzer" can still get tripped up and falsely declare something as broken (or functional), but it greatly reduces the "Hey I've done the task" when the task is not done or the output is broken.
So it is not reliable enough to run automatically?