Appreciated the response! I noticed the same when I ran tau2 myself on gpt-5 and 4.1, where gpt-5 is...

fallmonkey • yesterday at 11:03 PM • 0 replies • view on HN

Appreciated the response! I noticed the same when I ran tau2 myself on gpt-5 and 4.1, where gpt-5 is really good at looking at tool results and interleaving those with thinking, while 4.1/o3 struggles to decide the proper next tool to use even with thinking. To some extent, gpt-5 is too good at figuring out the right tool to use in one go. Amazing progress.

alt Hacker News