logoalt Hacker News

dangtoday at 6:37 PM11 repliesview on HN

[stub for offtopicness]

[see https://news.ycombinator.com/item?id=48416020 for how all this happened in the first place]


Replies

logicprogtoday at 12:58 PM

Some notes on this:

- I used GLM 5.1 to help with the coding and math for this.

- However, I explicitly dictated where the data should be pulled from (GitHub, Bugzilla, mailing list), how it should be tagged and grouped, and what data to look at (e.g. bugs instead of regressions)

- Additionally, I consulted with my wife, who has a master's degree in statistics from Penn State University for what sort of statistical methodology would be justified for this very limited data set, while still giving as much information as possible.

- I know the website looks like we stereotypically consider vibe-coded websites to look, but I actually explicitly asked for that. The original HTML design looked like a website from 1995, and I just prefer how this looks. It's pretty!

show 3 replies
ex-aws-dudetoday at 6:22 PM

So the original unfounded claim has 400+ comments because its perfect HN ragebait

The author provides evidence to the contrary and the HNers won't even engage with it instead just talking about the writing of the article in classic HN bikeshedding fashion.

How about after that we talk about the formatting of the website and the colors?

This site is really going down hill

Where is the accountability for your own opinions?

Are you guys only upvoting things that confirm your existing gripes?

show 1 reply
dangtoday at 5:59 PM

This submission was heavily flagged, presumably because the article sounded like genai. But the article now says the following:

> After posting this on Hacker News and recieving almost no substantive input, discussion, or response on the actual content of the article, I decided to rewrite all of the prose in my own voice.

I've therefore turned off the flags and hopefully people can actually now discuss the claims/findings being reported.

show 2 replies
roywigginstoday at 12:57 PM

> A simple distributional analysis of every rsync release with bug data. No model. No assumptions. Just placement.

If you want me to read your analysis, you are going to have to make it not read like Claude wrote it. What does "placement" even mean here?

show 3 replies
tappiotoday at 1:23 PM

A lot of people criticizing because it's heavily written with LLM, but I mean, if someone produced this piece pre-LLM, would they criticize it? is the critique due to use of LLM or due to the content being truly hard to follow? I read it and I would say, there are some problems with the writing, but its not a bad piece.

Of course this is a bigger problem, as its now harder to distinguish content that is "AI slop" with "content co-authored with AI that is carefully reviewed" with a quick glimpse, and the "AI smell" is quite off-putting. My initial reaction was also negative, but after glimpsing it through and reading the summaries, I found it decent summary, which also... speaks of this thread, of the content of the blog post and everything about the discussion and the strong feelings people have developed around the use of LLMs.

Anyhow, it would be good to disclose the repo with the code for the statistics & use of LLM in the writing right up front. Which model, and why it was used to do the writing, etc. Its enough to say "I think it writes better than I do" or "I was in a hurry, sorry" or what ever, but it really should be disclosed. It reads more honest.

ps. really... that sideways scroll? plz fix it.

show 3 replies
mschuster91today at 12:57 PM

This article reeks of LLM "assistance" at the very least.

Please, why can't people write stuff by hand themselves any more? It's a good analysis but how can I trust it without reviewing everything myself?!

show 1 reply
sfinktoday at 1:47 PM

Wow.

I am pretty insensitive to AI writing. I have never commented before about something sounding like AI, because mostly I don't notice. But this was so over the top that I spent the whole article trying to decide whether it was an intentional parody of AI writing style.

This article's language is not en-US. It's not en-BR. It's en-SLOP.

Yes, that was my clumsy attempt at AI parody. Here's another: this article doesn't just have AI tells. It is AI tells.

Every sentence is saturated with AI style. Perhaps the author so AI-indoctrinated that they can't see this? It doesn't read as even vaguely plausible human writing. Which is mightily ironic given the thesis of "AI generated stuff is just fine, m'kay?" The writing style does more to defeat its conclusion than the analysis itself.

As for the substance of the analysis, it seems pretty good to me but I see some flaws that weaken it a bit.

The presence of "The Outlier Nobody Noticed" proves nothing and deserves no more than a passing mention. A random release introduced way more bugs than the Claude-containing releases. That provides evidence that Claude doesn't introduce more bugs only if your hypothesis is a very naive "AI is the only thing that can ever increase bug introduction rates."

The whole analysis has very limited data. It's necessarily based off a single pair of releases at the very end of the chronological timeline. You would never be able to reject a null hypothesis based only on that, so it's even less sound to present it as proving the null hypothesis. (By the same token, it would be incorrect for critics to claim that it proves their point. Did anyone claim this, though? The heated complaints seemed more based on priors about AI code.)

"The critics' claim is a simple comparison: did the rate go up?" That's reductive. For one, these releases are known to be in reaction to a flood of (AI-discovered!) security reports, which is a novel situation and in fact is a huge confound to anyone arguing about what those two releases mean -- they're both heavily AI-written, but in response to an unusual situation. When the samples are only drawn from a distinct scenario, statistic analysis can only speak to the quality of code in that scenario.

Also, another reasonable hypothesis could be: AI-written code has bugs of a different flavor that bothers users more. It's optimized for passing tests and convincing people and AIs that security holes are closed, which means other considerations like preserving functionality can more easily be regressed as compared to if humans were doing it. (If true, it still doesn't support the claim that depending on AI code is a catastrophe, fwiw.)

I'm not arguing the conclusion is wrong. I'm saying the analysis proves far less than it claims to. As for whether it's a debacle for rsync to become dependent on AI code generation, I think that's a reasonable debate to have but it's not going to be resolved this reductively.

show 1 reply
duk3luk3today at 12:56 PM

This article is unfortunately unreadable because all of the prose is unfiltered LLM slop.

volume_techtoday at 1:04 PM

[flagged]

perching_aixtoday at 1:02 PM

[dead]