It also measures task coherence—ability to plan, form contingencies, recover from errors, mitigate accumulation of errors, and reconcile findings across a long context window.