The test harness point is the one that really sticks for me too. We've been using agentic loops...

CloakHQ • today at 9:18 AM • 1 reply • view on HN

The test harness point is the one that really sticks for me too. We've been using agentic loops for browser automation work, and the domain has a natural validation signal: either the browser session behaves the way a real user would, or it doesn't. That binary feedback closes the loop really cleanly.

The tricky part in our case is that "behaves correctly" has two layers - functional (did it navigate correctly?) and behavioral (does it look human to detection systems?). Agents are fine with the first layer but have no intuition for the second. Injecting behavioral validation into the loop was the thing that actually made it useful.

The .md scratch pad between sessions is underrated. We ended up formalizing it into a short decisions log - not a summary of what happened, just the non-obvious choices and why. The difference between "we tried X" and "we tried X, it failed because Y, so we use Z instead" is huge for the next session.

Replies

Schlagbohrer • today at 9:29 AM

What are you developing that technology for?

➕ show 1 reply

alt Hacker News

Replies