logoalt Hacker News

buchwaldlast Monday at 12:09 AM1 replyview on HN

Claude is surprisingly bad at visual understanding. I did a similar thing to OP where I wanted Claude to visually iterate on Storybook components. I found outsourcing the visual check to Playwright in vision mode (as opposed to using the default a11y tree) and Codex for understanding worked best. But overall the idea of a visual inspection loop went nowhere. I blogged about it here: https://solbach.xyz/ai-agent-accessibility-browser-use/


Replies

MagMuellerlast Monday at 2:48 AM

Interesting read. Agree that GUI is super hard for agents. Did you see "skills" from browser-use? We directly interact with network requests now.