It's not even very usable... I tried 2 different chats and both eventually got stopped due to t...

mpeg • yesterday at 5:33 PM • 3 replies • view on HN

It's not even very usable... I tried 2 different chats and both eventually got stopped due to the safeguards

One was a piece of code I gave it to improve, it did so and then started writing tests, some of which tested security so the safeguards triggered

Another was one of the cryptography puzzles I use as new model tests, which are hard to oneshot and there's no public solution anywhere, it completely refused to even try to solve it

Replies

gavinray • yesterday at 6:58 PM

I tried 2 chats and it declined both.

- 1st chat asked about a minor shoulder injury most likely mechanisms

- 2nd chat asked about optimal bloodwork testing markers

➕ show 1 reply

Erem • yesterday at 6:04 PM

So the degradation to Opus 4.8 from the article isn't happening in practice?

➕ show 3 replies

CSSer • yesterday at 6:56 PM

Oh joy. A model whose safeguards make it prone towards code that make your systems less safe. How brilliant!

alt Hacker News

Replies