Better link: https://... | alt Hacker News

denysvitali • last Saturday at 6:56 AM • 2 replies • view on HN

But yes, sadly it looks like the agent cheated during the eval

denysvitali • last Saturday at 10:54 AM

According to https://github.com/IQuestLab/IQuest-Coder-V1/issues/14#issue... the result is still good after fixing the cheating problem. 76.2% (from 81.4%) which still beats Opus 4.5 (74.4%)!!

➕ show 1 reply

s-macke • last Saturday at 7:17 AM

The link didn’t get enough votes a few days ago.

➕ show 1 reply