Some notes on this:
- I used GLM 5.1 to help with the coding and math for this.
- However, I explicitly dictated where the data should be pulled from (GitHub, Bugzilla, mailing list), how it should be tagged and grouped, and what data to look at (e.g. bugs instead of regressions)
- Additionally, I consulted with my wife, who has a master's degree in statistics from Penn State University for what sort of statistical methodology would be justified for this very limited data set, while still giving as much information as possible.
- I know the website looks like we stereotypically consider vibe-coded websites to look, but I actually explicitly asked for that. The original HTML design looked like a website from 1995, and I just prefer how this looks. It's pretty!
I'd suggest writing the lead-in yourself and boxing AI prose separately from your prose in the analysis for future articles. You can give the humanized summary/eli5/key points, then have "details according to AI" boxes that go into nitty-gritty. People seem to dislike AI ghostwriting, but most of these people still use AI, so perhaps keeping authorship clear and separate will avoid some of the flak.
Even if everything in the article is true you should not use AI to write this. A analogy would be tobacco company report on how smoking isn’t so bad for you.
I really struggle to believe you wrote text like:
> A simple distributional analysis of every rsync release with bug data. No model. No assumptions. Just placement.