logoalt Hacker News

fallmonkeyyesterday at 11:03 PM0 repliesview on HN

Appreciated the response! I noticed the same when I ran tau2 myself on gpt-5 and 4.1, where gpt-5 is really good at looking at tool results and interleaving those with thinking, while 4.1/o3 struggles to decide the proper next tool to use even with thinking. To some extent, gpt-5 is too good at figuring out the right tool to use in one go. Amazing progress.