I dislike Anthropic but I wouldn't argue 4.8 isn't an improvement on 4.5/4.6. Your ta...

CuriouslyC • last Tuesday at 5:26 PM • 6 replies • view on HN

I dislike Anthropic but I wouldn't argue 4.8 isn't an improvement on 4.5/4.6. Your tasks just might not typically need the extra intelligence.

Replies

jorl17 • last Tuesday at 5:44 PM

Opus 4.7/4.8 often over-engineers on my setups, plus:

- It talks a LOT more like GPT models. You know: wrinkle, shape, gate, coarse, scope, gap, path, production-ready-workflow-of-the-day, and so on -- "that's expected, a consequence of the previous like-driven workflow". If I wanted to get a headache using AI I would have gone with GPT in the first place!

- It outputs text in a much harder way to follow along. I can't exactly say what it is. Maybe a bit of everything? Bolds are missing, bullet points are gone, paragraphs are bland and too long, and it doesn't feel like a model programming with me, but rather a somewhat full of themselves grandpa developer looking down on me. It's very weird to describe this, but it is definitely how I feel.

Granted this can totally be because of the way it reacts to the prompts now. We've got a rather large corpus of skills and "rules and good practices" that Opus 4.6 responded to great, and maybe the new models just get turned into this when fed with them....I don't know.

Either way, with Opus 4.6 being as good as it is, I need Fable to be a significant step up to justify a price increase. if it can get me to babysit opus a little bit less on some stuff, it might be worth it. Otherwise, I'm very happy with Opus 4.6 and hope they don't deprecate it.

taormina • last Tuesday at 5:38 PM

I'd argue that 4.8 is a straight downgrade. For every type of task I've tried. It's been a gambit at this point. If 4.6 quits being available, I'm out at this point.

coronapl • last Tuesday at 7:03 PM

Reading so many contrary positions about which model is better or worse shows how difficult it is to measure intelligence based on personal experiences. Of course, benchmarks try to make the process as objective as possible, but they often don't correlate with our personal experiences.

The other day 4.6 was fantastic for x task. Today, 4.6 overengineered everything and I had to revert all my changes. When evaluating models, perhaps it makes sense to consider luck as an ingredient before reaching any personal conclusion.

surgical_fire • last Tuesday at 5:45 PM

I actually experience 4.8 as worse than 4.6 for everyday coding tasks.

dcchambers • last Tuesday at 5:30 PM

IME Opus 4.8 (and 4.7) is often a downgrade from 4.6. I find that it tends to overthink and overcomplicate things.

➕ show 2 replies

OtomotO • last Tuesday at 7:08 PM

Lol. If you're doing anything non trivial that's not a CRUD webapp but e.g. some physics simulation or high performance GPU code any and all models I've tried suck.

They are not just leagues behind what experts would code, they are not even playing the same game.

Which is to be expected, as there isn't so much physics or high performance gpu code available as there is for your typical CRUD API and JS frontend.

➕ show 1 reply

alt Hacker News

Replies