I used this in earnest yesterday on my Zillow saved listings. I prompted it to analyze the listings (I've got about 70 or so saved) and summarize the most recent price drops for each one and it mostly failed at the task. It gave the impression that it paginated through all the listings, but I don't think it actually did. I think the mechanism by which it works, which is to click links and take screenshots and analyze them must be some kind of token efficiency trade-off (as opposed to consuming the DOM) and it seems not great at the task.
As a reformed AI skeptic I see the promise in a tool like this, but this is light years behind other Anthropic products in terms of efficacy. Will be interesting to see how it plays out though.
What an asinine strategy to feed screenshots (does it scroll down and render the whole page?)
I had good luck treating HTML as XML and having Claude write xpath queries to grab useful data without ingesting the whole damn DOM
would be interesting to see if this works in playwright using your existing browser's remote control APIs (Using claude code via the playwright mcp)
> light years behind
So... give it another 3 month? (I assume we are talking AI light years)
LLMs struggle with time (or don't really have a concept with time). So unless that is addressed, they'll always suck in these tasks as you need synchronization. This is why text/cli was a much better UX to work with. std in/out is the best way to go but someone has to release something to keep pumping numbers.
sometimes I find that it helps if my prompt directly names the tools that I want the LLM to use, i.e. I'll tell it "do a WebFetch of so and so" etc.