> There are often also invariants that can be used to identify bugs without ground truth, e.g rendering the page with slightly different widths you can make some assertions about how far elements will move.
That's really interesting and sounds useful! I'm wondering if there are general guidelines/requirements (not specific to browsers) that could kind of "trigger" those things in the agent, without explicitly telling it. I think generally that's how I try to approach prompting.
I think if you explain that general idea the models can figure it enough to write into an implementation plan, at least some of the time. Interesting problem though.