I ran the same test I ran on Opus 4.6: feeding it my whole personal collection of ~900 poems which s...

jorl17 • yesterday at 10:43 PM • 1 reply • view on HN

I ran the same test I ran on Opus 4.6: feeding it my whole personal collection of ~900 poems which spans ~16 years

It is a far cry from Opus 4.6.

Opus 4.6 was (is!) a giant leap, the largest since Gemini 2.5 pro. Didn't hallucinate anything and produced honestly mind-blowing analyses of the collection as a whole. It was a clear leap forward.

Sonnet 4.6 feels like an evolution of whatever the previous models were doing. It is marginally better in the sense that it seemed to make fewer mistakes or with a lower level of severity, but ultimately it made all the usual mistakes (making things up, saying it'll quote a poem and then quoting another, getting time periods mixed up, etc).

My initial experiments with coding leave the same feeling. It is better than previous similar models, but a long distance away from Opus 4.6. And I've really been spoiled by Opus.

Replies

K0balt • yesterday at 10:57 PM

Opus 4.6 is outstanding for code, and for the little I have used it outside of that context, in everything else I have used it with. The productivity with code is at least 3x what I was getting with 5.2, and it can handle entire projects fairly responsibly. It doesn’t patronize the user, and it makes a very strong effort to capture and follow intentions. Unlike 5.2, I’ve never had to throw out a days work that it covertly screwed up taking shortcuts and just guessing.

➕ show 1 reply

alt Hacker News

Replies