logoalt Hacker News

visargatoday at 1:01 PM0 repliesview on HN

I made a harness that preserves memory for both user messages and task execution. One reason this works is related to judge agents - they can't review information that was not written down. So I track everything in my harness. The judge agents bring the most benefit, based on my evals. The coding agent can execute a task without all the ceremony just as well, but judging needs something to grasp on, besides code. And adding new perspectives helps a lot, it is the most useful intervention. My flow is - user emits a task, the agent plans, then judge agents review the plan, then main agent executes, then judge again reviews the execution. Might consume more tokens to track execution and judgements, but worth it.