Really interesting approach. Having a human in the loop seems like the right tradeoff given where computer-use models are today. One thing that came to mind is that this can be a new interface for software learning. If it works reliably, I could see it replacing static docs and videos!