It's not even very usable... I tried 2 different chats and both eventually got stopped due to the safeguards
One was a piece of code I gave it to improve, it did so and then started writing tests, some of which tested security so the safeguards triggered
Another was one of the cryptography puzzles I use as new model tests, which are hard to oneshot and there's no public solution anywhere, it completely refused to even try to solve it
So the degradation to Opus 4.8 from the article isn't happening in practice?
Oh joy. A model whose safeguards make it prone towards code that make your systems less safe. How brilliant!
I tried 2 chats and it declined both.
- 1st chat asked about a minor shoulder injury most likely mechanisms
- 2nd chat asked about optimal bloodwork testing markers