logoalt Hacker News

Retr0idlast Monday at 4:17 PM4 repliesview on HN

This seems anecdotal but with extra words. I'm fairly sure this is just the "wow this is so much better than the previous-gen model" effect wearing off.


Replies

codesstalast Monday at 4:26 PM

I've always been a believer in the "post honey-moon new model phase" being a thing, but if you look at their analysis of how often the postEdit hooks fire + how Anthropic has started obfuscating thinking blocks, it seems fishy and not just vibes

show 1 reply
rishabhaioverlast Monday at 4:23 PM

Nope, there is a categorical degradation in quality of output, especially with medium to high effort thinking tasks.

gchamonlivelast Monday at 4:22 PM

What about the analysis evidences?

show 1 reply
rzmmmlast Monday at 4:29 PM

I suspect you might be right but I don't really know. Wouldn't these proposed regressions be trivial to confirm with benchmarks?