Failures of workflows signal assumption violations that ultimately should percolate up to humans. Also, static dags are more amenable to human understanding than dynamic task decomposition. Robustness in production is good though, if you can bound agent behavior.
Best of 3 (or more) tournaments are a good strategy. You can also use them for RL via GRPO if you're running an open weight model.
In HNese this means "very impressive, keep up the good work."