The fact last few commits were attributed to claude doesn't mean previous ones didn't use it.
Also if you write a paper where you get statistical conclusions out of whole 2 datapoints you'd be laughed out of the room
> Also if you write a paper where you get statistical conclusions out of whole 2 datapoints you'd be laughed out of the room
I'm using methods appropriate to that low amount of data, first of all. Second of all, since I'm only trying to show there's no evidence for the anti-AI hypothesis (not disprove it, or prove the null hypothesis), that's sufficient in itself. Also, I wonder why nobody said things like you're saying ("there's too little data to tell") in response to all the absolutist claims that AI caused rsync to get worse?
> The fact last few commits were attributed to claude doesn't mean previous ones didn't use it.
At this point, you're just positing Russel's Teapot: you'll keep assuming more and more of the code was "secretly" Claude when there's no evidence for it and no reason to think so, just because you've started with the assumption that Claude makes things worse and you want to find a way to prove it.
Why not? Claude marks its commit messages. That there were none, and then there were, seems a signal.
Especially since if the earlier commits were so clearly AI authored yet without the Claude marker, surely you or anyone would be able to spot them. You could say, X commit does not have the Claude commit marker yet was AI written. But for all the speculation on this thread, I haven’t seen anyone actually doing that. What may be possible is that the rsync maintainers used AI to assist yet reviewed and edited themselves, as many devs do, and if so then the stats in this article are still notable: there are no poor quality outliers that can reliably be attributed to AI and if one specific release (3.4.0) was, the subsequent releases which presumably also had as much AI as this speculative hidden AI release only show improvement and thus act as a pro-AI argument.
The blog has many more datapoints than two. It compares many releases. You’re looking at 2-vs, not 2.