logoalt Hacker News

bob1029yesterday at 5:35 PM1 replyview on HN

> We’ve therefore launched the model with safeguards that mean queries on some topics will instead receive a response from our next-most-capable model, Claude Opus 4.8. To release the model both safely and quickly, we’ve tuned these safeguards conservatively—they’ll sometimes catch harmless requests, though they trigger, on average, in less than 5% of sessions. With more capable models arriving in the coming months...

This sounds suspiciously like a capacity story masquerading as a safety story.


Replies

azan_yesterday at 7:26 PM

Approx. 5% sessions? That's insanely high.