logoalt Hacker News

denysvitalilast Saturday at 6:56 AM2 repliesview on HN

Better link: https://iquestlab.github.io/

But yes, sadly it looks like the agent cheated during the eval


Replies

denysvitalilast Saturday at 10:54 AM

According to https://github.com/IQuestLab/IQuest-Coder-V1/issues/14#issue... the result is still good after fixing the cheating problem. 76.2% (from 81.4%) which still beats Opus 4.5 (74.4%)!!

show 1 reply
s-mackelast Saturday at 7:17 AM

The link didn’t get enough votes a few days ago.

show 1 reply