What has worked better for me is splitting authority, not just prompts. One agent can touch app code...

pastescreenshot • yesterday at 8:05 PM • 1 reply • view on HN

What has worked better for me is splitting authority, not just prompts. One agent can touch app code, one can only write failing tests plus a short bug hypothesis, and one only reviews the diff and test output. Also make test files read only for the coding agent. That cuts out a surprising amount of self-grading behavior.

Replies

huslage • yesterday at 9:25 PM

How do you limit access like that?

alt Hacker News

Replies