Taking screenshots and recording is not quite the same as "seeing". A camera doesn't see things. If the tool can identify issues and improvements to make, by analyzing the screenshot, that's I think useful.
I read it in the same vein as saying that a sub's sonar enables "seeing" its surroundings. The focus is on having a spatial sensor rather than on the qualia of how that sensation is afterwards processed/felt.
> If the tool can identify issues and improvements (...)
Tools like Claude and the like can, and do. This is just a utility to make the process easier.
> It’s not a testing framework. The agent doesn’t decide pass/fail. It just gives me the evidence so I don’t have to open the browser myself every time.
From the OP, i don't think this is what is meant for what you are saying.