The "Council of models" is a good first step, but ultimately I found myself settling on an automated talent acquisition pipeline.
I have a BIRTHING_POOL.md that combines the best AGENTS.md and introduces random AI-generated mutations and deletions. The candidates are tested using take-home PRs which are reviewed by HR.md and TECH_MANAGER.md. TECH_MANAGER.md measures completion rate per tokens (effectiveness) and then sends the stack ranking of AGENT.mds to HR to manage the talent pool. If agent effectiveness drops low enough, we pull from the birthing pool and interview more candidates.
The end result is that it effectively manages a wider range of agent talents and you don't get into these agent hive mind spirals you get if every worker has the same system prompt.
Is this satire? I can't tell any more.