Any ideas how to solve the agent's don't have total common sense problem?
I have found when using agents to verify agents, that the agent might observe something that a human would immediately find off-putting and obviously wrong but does not raise any flags for the smart-but-dumb agent.
To clarify you are using the "fast brain, slow brain" pattern? Maybe an example would help.
Broadly speaking, we see people experiment with this architecture a lot often with a great deal of success. A few other approaches would be an agent orchestrator architecture with an intent recognition agent which routes to different sub-agents.
Obviously there are endless cases possible in production and best approach is to build your evals using that data.