Claude is surprisingly bad at visual understanding. I did a similar thing to OP where I wanted Claud...

buchwald • last Monday at 12:09 AM • 1 reply • view on HN

Claude is surprisingly bad at visual understanding. I did a similar thing to OP where I wanted Claude to visually iterate on Storybook components. I found outsourcing the visual check to Playwright in vision mode (as opposed to using the default a11y tree) and Codex for understanding worked best. But overall the idea of a visual inspection loop went nowhere. I blogged about it here: https://solbach.xyz/ai-agent-accessibility-browser-use/

Replies

MagMueller • last Monday at 2:48 AM

Interesting read. Agree that GUI is super hard for agents. Did you see "skills" from browser-use? We directly interact with network requests now.

alt Hacker News

Replies