IQuest-Coder: A new open-source code model beats Claude Sonnet 4.5 and GPT 5.1 [pdf]

175 points • by shenli3514 • last Saturday at 4:01 AM • 45 comments • view on HN

Comments

denysvitali • last Saturday at 6:56 AM

Better link: https://iquestlab.github.io/

But yes, sadly it looks like the agent cheated during the eval

➕ show 2 replies

sabareesh • last Saturday at 5:55 AM

TL;DR is that they didn't clean the repo (.git/ folder), model just reward hacked its way to look up future commits with fixes. Credit goes to everyone in this thread for solving this: https://xcancel.com/xeophon/status/2006969664346501589

(given that IQuestLab published their SWE-Bench Verified trajectory data, I want to be charitable and assume genuine oversight rather than "benchmaxxing", probably an easy to miss thing if you are new to benchmarking)

https://www.reddit.com/r/LocalLLaMA/comments/1q1ura1/iquestl...

➕ show 3 replies

brunooliv • last Saturday at 6:25 AM

GLM-4.7 in opencode is the only opensource one that comes close in my experience and probably they did use some Claude data as I see the occasional You’re absolutely right in there

➕ show 2 replies

adastra22 • last Saturday at 5:22 AM

A 40B weight model that beats Sonnet 4.5 and GPT 5.1? Can someone explain this to me?

➕ show 6 replies

simonw • last Saturday at 6:34 AM

Has anyone run this yet, either on their own machine or via a hosted API somewhere?

squigz • last Saturday at 12:36 PM

This is a lie, so why is it still on the front page?

alt Hacker News

IQuest-Coder: A new open-source code model beats Claude Sonnet 4.5 and GPT 5.1 [pdf]

Comments