logoalt Hacker News

mpegyesterday at 5:33 PM3 repliesview on HN

It's not even very usable... I tried 2 different chats and both eventually got stopped due to the safeguards

One was a piece of code I gave it to improve, it did so and then started writing tests, some of which tested security so the safeguards triggered

Another was one of the cryptography puzzles I use as new model tests, which are hard to oneshot and there's no public solution anywhere, it completely refused to even try to solve it


Replies

gavinrayyesterday at 6:58 PM

I tried 2 chats and it declined both.

- 1st chat asked about a minor shoulder injury most likely mechanisms

- 2nd chat asked about optimal bloodwork testing markers

show 1 reply
Eremyesterday at 6:04 PM

So the degradation to Opus 4.8 from the article isn't happening in practice?

show 3 replies
CSSeryesterday at 6:56 PM

Oh joy. A model whose safeguards make it prone towards code that make your systems less safe. How brilliant!