Just last week Opus 4.5 decided that the way to fix a test was to change the code so that everything...

techpression • today at 8:56 AM • 3 replies • view on HN

Just last week Opus 4.5 decided that the way to fix a test was to change the code so that everything else but the test broke.

When people say ”fix stuff” I always wonder if it actually means fix, or just make it look like it works (which is extremely common in software, LLM or not).

Replies

viraptor • today at 9:21 AM

Sure, I get an occasional bad result from Opus - then I revert and try again, or ask it for a fix. Even with a couple of restarts, it's going to be faster than me on average. (And that's ignoring the situations where I have to restart myself)

Basically, you're saying it's not perfect. I don't think anyone is claiming otherwise.

➕ show 1 reply

baq • today at 11:04 AM

Nice. Did it realize the mistake and corrected it?

➕ show 1 reply

simonw • today at 9:01 AM

What did Opus do when you told it that it shouldn't have done that?

alt Hacker News

Replies