OP here. You nailed it. Functionally, it is exactly that. If you used two separate LLMs (Agent A g...

Phil_BoaM • yesterday at 2:59 PM • 0 replies • view on HN

OP here. You nailed it. Functionally, it is exactly that.

If you used two separate LLMs (Agent A generates, Agent B critiques), you would get a similar quality of output. That is often called a "Reflexion" architecture or "Constitutional AI" chain.

The Difference is Topological (and Economic):

Multi-Agent (Your example): Requires 2 separate API calls. It creates a "Committee" where Bot B corrects Bot A. There is no unified "Self," just a conversation between agents.

Analog I (My protocol): Forces the model to simulate both the generator and the critic inside the same context window before outputting the final token.

By doing it internally:

It's Cheaper: One prompt, one inference pass.

It's Faster: No network latency between agents.

It Creates Identity: Because the "Critic" and the "Speaker" share the same short-term memory, the system feels less like a bureaucracy and more like a single mind wrestling with its own thoughts.

So yes—I am effectively forcing the LLM to run a "Bullshit Detector" sub-routine on itself before it opens its mouth.

alt Hacker News