logoalt Hacker News

moomoo11today at 6:19 PM3 repliesview on HN

yeah but it also made some tests pass by changing the tests. i’m not super familiar so i’ll dig more on weekend but it seems sus pending more review. i’ve had ai do similar things that i caught in manual review. cheating the test is bad.


Replies

gmueckltoday at 6:48 PM

It is welk known that agents can cheat or go off on tangents and not recover. Just recently deleted a bunch of code files that I didn't ask for. The code wasn't even used anywhere.

atonsetoday at 7:07 PM

That's why they've merged it into canary so they can continue working on it.

tuo-leitoday at 6:47 PM

[dead]