> 5 and 5.1 both felt overfit and would break down and be stubborn when you got them outside thei...

deaux • today at 1:47 AM • 1 reply • view on HN

> 5 and 5.1 both felt overfit and would break down and be stubborn when you got them outside their lane. As opposed to Opus 4.5 which is lovely at self correcting.

This is simply the "openness vs directive-following" spectrum, which as a side-effect results in the sycophancy spectrum, which still none of them have found an answer to.

Recent GPT models follow directives more closely than Claude models, and are less sycophantic. Even Claude 4.5 models are still somewhat prone to "You're absolutely right!". GPT 5+ (API) models never do this. The byproduct is that the former are willing to self-correct, and the latter is more stubborn.

Replies

baq • today at 7:00 AM

Opus 4.5 answers most of my non-question comments with ‘you’re right.’ as the first thing in the output. At least I’m not absolutely right, I’ll take this as an improvement.

alt Hacker News

Replies