> [...] newer Claude models sometimes call Pi’s edit tool with extra, invented fields in the nested edits[] array
> My strongest hypothesis is that this is not random deterioration but a training artifact. [...] Anthropic’s own client appears to expect and accept a fair amount of slop and repairs it, mostly silently
> If reinforcement learning happens in a harness like that, or a simulation of one, then slightly malformed tool calls can still complete the task and receive reward.
> Worse, the model may become very strongly adapted to the canonical Claude Code edit tool shape.
> Tool schemas are somewhere in the distribution and some shapes are close to what the model saw during post-training and some are far away.
Great article.
Interesting root cause hypothesis. Couldn't one simply strip the slop-handling from the RL env's harness to avoid this though?
I do agree on the walled garden being built here. Proprietary frontier models performing best in proprietary harnesses makes sense for Anthropic's interests.