logoalt Hacker News

wxwyesterday at 11:01 PM0 repliesview on HN

> [...] newer Claude models sometimes call Pi’s edit tool with extra, invented fields in the nested edits[] array

> My strongest hypothesis is that this is not random deterioration but a training artifact. [...] Anthropic’s own client appears to expect and accept a fair amount of slop and repairs it, mostly silently

> If reinforcement learning happens in a harness like that, or a simulation of one, then slightly malformed tool calls can still complete the task and receive reward.

> Worse, the model may become very strongly adapted to the canonical Claude Code edit tool shape.

> Tool schemas are somewhere in the distribution and some shapes are close to what the model saw during post-training and some are far away.

Great article.

Interesting root cause hypothesis. Couldn't one simply strip the slop-handling from the RL env's harness to avoid this though?

I do agree on the walled garden being built here. Proprietary frontier models performing best in proprietary harnesses makes sense for Anthropic's interests.