IME Opus 4.8 (and 4.7) is often a downgrade from 4.6. I find that it tends to overthink and overcomp...

dcchambers • last Tuesday at 5:30 PM • 2 replies • view on HN

IME Opus 4.8 (and 4.7) is often a downgrade from 4.6. I find that it tends to overthink and overcomplicate things.

Replies

aspenmartin • last Tuesday at 5:33 PM

Yes but there’s a reason we don’t evaluate these models this way and instead do it as carefully and thoughtfully as we can at scale. Human evaluations are important but they are an absolute minefield of footguns. 4.8 is not a downgrade from 4.6 there is an insane amount of hard data that contradicts this.

➕ show 6 replies

BoorishBears • last Tuesday at 5:40 PM

"Fable 5" is Opus 4.7, and the Opus 4.7 we got is a Sonnet sized model on a stronger base.

That's where all the regressions and inconsistency in experiences stem from: RL can still only go so far vs having more parameters

alt Hacker News

Replies