Even Anthropic research articles consistently demonstrate they themselves use one agent, and just tune the harness around it.
I ignore all Skills, MCPs, and view all of these as distractions that consume context, which leads to worse performance. It's better to observe what agent is doing, where it needs help and just throw a few bits of helpful, sometimes persistent context at it.
You can't observe what 20 agents are doing.
Yes, but you can observe the agent observing what 20 agents are doing! /s
Now I see why Grey Walter made artificial tortoises in the 50s - he foresaw that it would be turtles all the way down.
For most tasks, I agree. One agent with a good harness wins. The case for multiple agents is when the context required to solve the problem exceeds what one agent can hold. This Putnam problem needed more working context than fits in a single window. Decomposing into subgoals lets each agent work with a focused context instead of one agent suffocating on state. Ideally, multi-agent approaches shouldn't add more overall complexity, but there needs to be better tooling for observation etc, as you describe.