Another bunch of dead give aways in code bases with READMEs is the repetitive:
- "No X, No Y, No Z." pattern
- "Here is X - it makes Y"
The worst and most obvious one is the constant over use of emoji ticks and crosses.
/* This function doesn't return an int. It doesn't return a float. It doesn't return a char. It doesn't ret-- */
Alternatively, no one sounds like an llm, an llm sounds like someone, typically those close to the median of the training corpus. If AI were genuinly capable of novelty, it would be a big deal, tech bros having enough work ethic to design new detectable prose for an llm is a mssive reach and has no real evidence supporting it, else why do tech bros only tackle the easier issues? Things we have massive well labelled corpi for? Why is it never dishwashing and folding laundry?
I put to you, if you see a trope in AI writing it's because that trope appeared in the training corpus. Therefore, sure, being predjudice against it lets you catch some AI, but you'll also flag human outout. I think that may not be worth it in the end.
For calibration purposes, I offer you a pre-LLM README I wrote that includes an em-dash* followed by "No X, No Y, No Z": https://github.com/DavidBuchanan314/stelf-loader
*actually a hyphen but it's functioning as an em dash.