This suggests that nobody was screening this papers in the first place—so is it actually significant that people are using LLMs in a setting without meaningful oversight?
These clearly aren't being peer-reviewed, so there's no natural check on LLM usage (which is different than what we see in work published in journals).
When I was reviewing such papers, I didn't bother checking that 30+ citations were correctly indexed. I focused on the article itself, and maybe 1 or 2 citations that are important. That's it. For most citations, they are next to an argument that I know is correct, so why would I bother checking. What else do you expect? My job was to figure out if the article ideas are novel and interesting, not if they got all their citations right.
Academic venues don't have enough reviewers. This problem isn't new, and as publication volumes increase, it's getting sharply worse.
Consider the unit economics. Suppose NeurIPS gets 20,000 papers in one year. Suppose each author should expect three good reviews, so area chairs assign five reviewers per paper. In total, 100,000 reviews need to be written. It's a lot of work, even before factoring emergency reviewers in.
NeurIPS is one venue alongside CVPR, [IE]CCV, COLM, ICML, EMNLP, and so on. Not all of these conferences are as large as NeurIPS, but the field is smaller than you'd expect. I'd guess there are 300k-1m people in the world who are qualified to review AI papers.
As one who reviews 20+ papers per year, we don't have time to verify each reference.
We verify: is the stuff correct, and is it worthy of publication (in the given venue) given that it is correct.
There is still some trust in the authors to not submit made-up-stuff, albeit it is diminishing.