Haven't tried Claude for this, but I can't think how it could possibly do. I built a game bot using Win32 API to send input and screen capture to OCR and some OpenCV to recognize game elements. Dead simple and actually quite boring and repeatitive after I worked on it for a while. How could Claude agents possibly do this ? I did use Claude to refer docs and API, though.
That actually sounds like something Claude could do pretty easily.
Yegge's book describes his coauthor's first vibe coding project. It went through screenshots he'd saved of youtube videos, read the time with OCR, looked up transcripts, and generated video snippets with subtitles added. (I think this was before youtube added subtitles itself.) He had it done in 45 minutes.
And using agents to control other applications is pretty common.
My elementary schooler did this with pictures of his stuffed animals last week. I helped a little bit, but most of it was Claude. He's never coded before.