logoalt Hacker News

EMM_386yesterday at 8:28 PM1 replyview on HN

> Claude is definitely not taking screenshots of that desktop & organizing, it's using normal file management cli tools

Are you sure about that?

Try "claude --chrome" with the CLI tool and watch what it does in the web browser.

It takes screenshots all the time to feed back into the multimodal vision and help it navigate.

It can look at the HTML or the JavaScript but Claude seems to find it "easier" to take a screenshot to find out what exactly is on the screen. Not parse the DOM.

So I don't know how Cowork does this, but there is no reason it couldn't be doing the same thing.


Replies

dalenwyesterday at 9:06 PM

I wonder if there's something to be said about screenshots preventing context poisoning vs parsing. Or in other words, the "poison" would have to be visible and obvious on the page where as it could be easily hidden in the DOM.

And I do know there are ways to hide data like watermarks in images but I do not know if that would be able to poison an AI.