logoalt Hacker News

RS-232yesterday at 9:12 PM4 repliesview on HN

Has anyone had success using 2 agents, with one as the creator and one as an adversarial "reviewer"? Is the output usually better or worse?


Replies

mapontoseventhsyesterday at 9:50 PM

This is how its meant to be done. Usually with the reviewer being the stronger model.

That said, with both the test driven development this post describes and the reviewer model (its best to do both) you have to provide an escape hatch or out for the model. If you let the model get inescapably stuck with an impossible test or constraints it will just start deleting tests or rewriting the entire codebase in rust or something.

My escape hatch is "expert advice". I let the weak LLM phone a friend when its stuck and ask a smarter LLM for assistance. Its since stopped going crazy and replacing all my tests with gibberish... mostly.

sanxiynyesterday at 9:49 PM

That works well. Anthropic wrote a writeup on it.

https://www.anthropic.com/engineering/harness-design-long-ru...

esafakyesterday at 9:20 PM

This is routine. We have Gemini (which is not our coding model) review our PRs and it genuinely catches mistakes. Even using the same model as the creator, without its context to bias it, would probably catch many mistakes.

peytongreen_devyesterday at 11:06 PM

[dead]