> LLMs are especially good at evaluating documents to assess the degree that an LLM assisted their creation!)
That's a bold claim. Do they have data to back this up? I'd only have confidence to say this after testing this against multiple LLM outputs, but does this really work for, e.g. the em dash leaderboard of HN or people who tell an LLM to not do these 10 LLM-y writing cliches? I would need to see their reasoning on why they think this to believe.
I thought about it - a quick way to verify whether something was created with LLM is to feed an LLM half of the text and then let it complete token by token. Every completion, check not just for the next token but the next n-probable tokens. If one of them is the one you have in the text, pick it and continue. This way, I think, you can identify how much the model is "correct" by predicting the text it hasn't yet seen.
I didn't test it and I'm far from an expert, maybe someone can challenge it?
I would be surprised they have any data about this. There are so many ways LLMs can be involved, from writing everything, to making text more concise or just "simple proofreading". Detecting all this with certainty is not trivial and probably not possible with the current tools we have.
I am really surprised that people are surprised by this, and honestly the reference was so casual in the RFD because it's probably the way that I use LLMs the most (so very much coming from my own personal experience). I will add a footnote to the RFD to explain this, but just for everyone's benefit here: at Oxide, we have a very writing-intensive hiring process.[0] Unsurprisingly, over the last six months, we have seen an explosion of LLM-authored materials (especially for our technical positions). We have told applicants to be careful about doing this[1], but they do it anyway. We have also seen this coupled with outright fraud (though less frequently). Speaking personally, I spend a lot of time reviewing candidate materials, and my ear has become very sensitive to LLM-generated materials. So while I generally only engage an LLM to aid in detection when I already have a suspicion, they have proven adept. (I also elaborated on this a little in our podcast episode with Ben Shindel on using LLMs to explore the fraud of Aidan Toner-Rodgers.[2])
I wasn't trying to assert that LLMs can find all LLM-generated content (which feels tautologically impossible?), just that they are useful for the kind of LLM-generated content that we seek to detect.
[0] https://rfd.shared.oxide.computer/rfd/0003
[1] https://oxide.computer/careers
[2] https://oxide-and-friends.transistor.fm/episodes/ai-material...