logoalt Hacker News

WhyIsItAlwaysHNtoday at 10:58 AM0 repliesview on HN

This result sounds very unsurprising at this point of having models that can reliably use tools.

Some part of RL training must focus on the length of responses. I would also guess that Anthropic and OpenAI have an incentive to optimize response length without sacrificing user satisfaction/retention.

For example, I would be more satisfied if claude code didn't execute a side-effect free script that produces no output. Embodying the concept of silence is semantically close to predicting the output of an empty program, so it's more efficient to say nothing.

Even in the past though similar tests gave output like says nothing. I think that points more towards optimizing for less tokens than the implied special understanding by the latest models.