Reliability has been the real bottleneck for multi-agent setups in production. The hard part isnt getting one agent to do something clever once - its making repeated runs observable and bounded when tools fail halfway through. Idempotency checks, explicit handoff state, and human review gates have mattered more for us than adding another model or another agent role.
New account spamming LM replies, nice