logoalt Hacker News

augusteoyesterday at 7:18 PM1 replyview on HN

The shift from "click and hope" to explicit post-conditions is the right framing.

We've been building agent-based automation and the reliability problem is brutal. An agent can be 95% accurate on each step, but chain ten steps together and you're at 60% success rate. That's not usable.

Curious about the failure modes though. What happens when the verification itself is wrong? Like, the cart shows updated on screen but the verification layer checks a stale element?


Replies

tonywwyesterday at 7:39 PM

Absolutely agree on the compounding error point - that’s exactly what pushed us toward verification.

On “verification wrong”: we try hard to keep predicates grounded and re-evaluated, not “check a cached handle”. Assertions do re-snapshot / re-query during each retry, and we scope them to signals that should change (URL, existence/state of an element, text/value).

If the page is flaky/stale, the assertion just won’t prove the condition within the retry window and we fail with artifacts such as frames of clip (if ffmpeg available) rather than claiming success.

There are still edge cases (virtualized DOM, optimistic UI, async updates), but in those cases the goal is the same: make the failure explicit and debuggable with artifacts and time-travel traces, not silently drift.