logoalt Hacker News

pacha3000yesterday at 7:52 PM6 repliesview on HN

I'm the first to be tired of everyone, for every model, that says "uuuh became dumber" because I didn't believe them

... until this week! Opus is struggling worse than Sonnet those last two weeks.


Replies

saghmyesterday at 9:57 PM

Forget the agent itself being dumber: right now I'm getting an "API error: usage limit exceeded" message whenever I try anything despite my usage showing as 26% for the session limit and 8% for the week (with 0/5 routines, which I guess is what this thread is about). This is with the default model and effort, and Claude Code is saying I need to turn on extra usage for it to work. Forget that, I just canceled my subscription instead.

There's utility in LLMs for coding, but having literally the entire platform vibe-coded is too much for me. At this point, I might genuinely believe they're not intentionally watering anything down, because it's incredibly believable that they just have no clue how any of it works anymore.

girvoyesterday at 9:41 PM

My favourite was, Opus 4.6 last night (to be fair peak IST time, late afternoon my time), the first prompt with a small context: jams a copy-pasted function in between a bunch of import statements, doesn't even wire up it's own function and calls it done. Wild, I've not seen failure states like that since old Sonnet 4

jpcompartiryesterday at 9:10 PM

Likewise, I foolishly assumed everybody else was just doing it wrong.

But this week I've lost count of the times I've had to say something along the lines of: "Can you check our plan/instructions, I'm pretty sure I said we need to do [this thing] but you've done [that thing]..."

And get hit with a "You're absolutely right...", which virtually never happened for me. I think maybe once since Opus 4-6.

comboyyesterday at 8:21 PM

Pretty reassuring to hear that. I was skeptical too, there's a lot of variables like some crap added to memory specific skill or custom instructions interfering with the workflow and what not. But now it was like a toddler that consumes money when talking.

show 1 reply
qingcharlesyesterday at 9:56 PM

Is it? Or is it the task you're trying to do? Opus 4.6 has been staggeringly good for me this last week, both inside Claude Code and through Antigravity until I used up my quota.

combyn8toryesterday at 9:54 PM

In my experience Opus and Claude have declined significantly over the past few weeks. It actually feels like dealing with an employee that has become bored and intentionally cuts corners.

show 1 reply