logoalt Hacker News

techpressiontoday at 8:56 AM3 repliesview on HN

Just last week Opus 4.5 decided that the way to fix a test was to change the code so that everything else but the test broke.

When people say ”fix stuff” I always wonder if it actually means fix, or just make it look like it works (which is extremely common in software, LLM or not).


Replies

viraptortoday at 9:21 AM

Sure, I get an occasional bad result from Opus - then I revert and try again, or ask it for a fix. Even with a couple of restarts, it's going to be faster than me on average. (And that's ignoring the situations where I have to restart myself)

Basically, you're saying it's not perfect. I don't think anyone is claiming otherwise.

show 1 reply
baqtoday at 11:04 AM

Nice. Did it realize the mistake and corrected it?

show 1 reply
simonwtoday at 9:01 AM

What did Opus do when you told it that it shouldn't have done that?