With a good programmer, if they do multiple passes of a refactor, each pass makes the code more elegant, and the next pass easier to understand and further improve.
Claude has a bias to add lines of code to a project, rather than make it more concise. Consequently, each refactoring pass becomes more difficult to untangle, and harder to improve.
Ideally, in this experiment, only the first few passes would result in changes - mostly shrinking the project size, and from then on, Claude would change nothing - just a like a very good programmer.
This is the biggest problem with developing with Claude, by far. Anthropic should laser focus on fixing it.