> One of the most prominent improvements in Opus 4.8 is its honesty. We train all our models to b...

james_marks • today at 4:57 PM • 8 replies • view on HN

> One of the most prominent improvements in Opus 4.8 is its honesty. We train all our models to be honest—for instance, to avoid making claims that they can’t support. But a general problem with AI models is that they sometimes jump to conclusions, confidently claiming to have made progress in their work despite the evidence being thin. Early testers report that Opus 4.8 is more likely to flag uncertainties about its work and less likely to make unsupported claims.

Would be awesome if true

Replies

majormajor • today at 5:03 PM

"Honesty" seems like unnecessary (and annoying) anthropomorphism there. I don't think there's any intent of fraud or deception in outputs from these things, just overreaching of prediction. Based on the latter part of the paragraph, I wish they'd just say something like "less likely to skip steps or overemphasize thin evidence" in the first place.

Don't play to the sci-fi "this thing's trying to outsmart me" tropes.

➕ show 4 replies

HAL3000 • today at 5:21 PM

Yeah, it's super annoying. A few days ago, Opus 4.7 created a plan with several items on it, including an auth feature. It then went through the plan and reported that it had created the auth feature, that everything was secure, and that the tests passed.

The issue was that it hadn't actually implemented the auth feature. After I confronted it about this, it admitted that it indeed hadn't done it and said it would implement it now.

If we had just trusted its output, we would now have a security vulnerability in production, allowing anyone to access other people's accounts.

➕ show 4 replies

ealready_value • today at 5:12 PM

Opus 4.7 was already trying hard to appear honest. Most conversations I have with it about advice or focusing an opinion often include "my honest take" or "my honest opinion".

The problem is that once I asked it "I'm thinking about A or B" twice, once with "I like A more but suspect B would be best" and a second time with them reversed. Not surprisingly, both times it chose the one I said I suspected was best as it's honest opinion.

➕ show 1 reply

legitster • today at 5:17 PM

Part of the problem is also garbage-in/garbage-out. There's a lot of human information on the internet that is also confidently wrong.

I use Sonnet a lot for learning about history or contextualizing news topics. It's really good at this for the most part. But there are a lot of topics where "consensus" between either academics or journalists is really "one secondary source which gets repeated a lot".

➕ show 1 reply

benzible • today at 5:15 PM

In the context of Claude Code, "honest" usually means that the agent took a shortcut, skipped requirements, etc. It's the model giving itself credit for admitting to failing rather than actually doing what was requested.

pants2 • today at 5:19 PM

[dead]

soperj • today at 5:02 PM

My guess is that Claude Opus 4.8 wrote that and is lying to you.

malfist • today at 5:00 PM

And yet, every release has claimed lower hallucination rates. But they persist.

➕ show 2 replies

alt Hacker News

Replies