Ok, but can it play Factorio?
I'm curious if others are finding that there's a comfort in staying within the Claude ecosystem because when it makes a mistake, we get used to spotting the pattern. I'm finding that when I try new models, their "stupid" moments are more surprising and infuriating.
Given this tech is new, the experience of how we relate to their mistakes is something I think a bit about.
Am I alone here, are others finding themselves more forgiving of "their preferred" model provider?
The fact that the post singled out SWE-bench at the top makes the opposite impression that they probably intended.
I hate on Anthropic a fair bit, but the cost reduction, quota increases and solid "focused" model approach are real wins. If they can get their infrastructure game solid, improve claude code performance consistency and maintain high levels of transparency I will officially have to start saying nice things about them.
80% and 77% is not that much lol
They lowered the price because this is a massive land grab and is basically winner take all.
I love that Antrhopic is focused on coding. I've found their models to be significantly better at producing code similar to what I would write, meaning it's easy to debug and grok.
Gemini does weird stuff and while Codex is good, I prefer Sonnet 4.5 and Claude code.
this is quite a good
The first chart is straight from "how to lie in charts"..
Got the river crossing one:
https://claude.ai/chat/0c583303-6d3e-47ae-97c9-085cefe14c21
Still fucked up one about the boy and the surgeon though:
This is great. Sonnet 4.5 has degraded terribly.
I can get some useful stuff from a clean context in the web ui but the cli is just useless.
Opus is far superiour.
Today sonnet 4.5 suggested to verify remote state file presence by creating an empty one locally and copy it to the remote backend. Da fuq? University level programmer my a$$.
And it seems like it has degraded this last month.
I keep getting braindead suggestions and code that looks like it came from a random word generator.
I swear it was not that awful a couple of months ago.
Opus cap has been an issue, happy to change and I really hope the nerf rumours are just that. Undounded rumours and the defradation has a valid root cause
But honestly sonnet 4.5 has started to act like a smoking pile of sh**t