Thanks, interesting. Does this make it more surprising that the other benchmarks have improved? I'm not sure I understand the benchmarks well enough - but I'm wondering whether with agentic workflows it's possible to get away with a smaller more focussed context (and hence lower cost) whilst achieving the same or better performance, because of agentic model's ability to decide what the put in context as they work
Thanks, interesting. Does this make it more surprising that the other benchmarks have improved? I'm not sure I understand the benchmarks well enough - but I'm wondering whether with agentic workflows it's possible to get away with a smaller more focussed context (and hence lower cost) whilst achieving the same or better performance, because of agentic model's ability to decide what the put in context as they work