I just feel this is a great example of someone falling into the common trap of treating an LLM like a human.
They are vastly less intelligent than a human and logical leaps that make sense to you make no sense to Claude. It has no concept of aesthetics or of course any vision.
All that said; it got pretty close even with those impediments! (It got worse because the writer tried to force it to act more like a human would)
I think a better approach would be to write a tool to compare screenshots, identity misplaced items and output that as a text finding/failure state. claude will work much better because your dodging the bits that are too interpretive (that humans rock at and LLMs don't)
The blog frequently refers to the LLM as "him" instead of "it" which somehow feels disturbing to me.
I love to anthropomorphize things like rocks or plants, but something about doing it to an AI that responds in human like language enters an uncanny valley or otherwise upsets me.
> vastly less intelligent than a human
I would more phrase it like that they are a completely alien “intelligence” that cant really be compared to human intelligence