logoalt Hacker News

Over fifty new hallucinations in ICLR 2026 submissions

478 pointsby puttycatyesterday at 1:16 PM385 commentsview on HN

Comments

WWWWHyesterday at 7:50 PM

Surely this is gross professional misconduct? If one of my postdocs did this they would be at risk of being fired. I would certainly never trust them again. If I let it get through, I should be at risk.

As a reviewer, if I see the authors lie in this way why should I trust anything else in the paper? The only ethical move is to reject immediately.

I acknowledge mistakes and so on are common but this is different league bad behaviour.

show 1 reply
ulrashidayesterday at 4:12 PM

Unfortunately while catching false citations is useful, in my experience that's not usually the problem affecting paper quality. Far more prevalent are authors who mis-cite materials, either drawing support from citations that don't actually say those things or strip the nuance away by using cherry picked quotes simply because that is what Google Scholar suggested as a top result.

The time it takes to find these errors is orders of magnitude higher than checking if a citation exists as you need to both read and understand the source material.

These bad actors should be subject to a three strikes rule: the steady corrosion of knowledge is not an accident by these individuals.

show 5 replies
theoldgreybeardyesterday at 3:09 PM

If a carpenter builds a crappy shelf “because” his power tools are not calibrated correctly - that’s a crappy carpenter, not a crappy tool.

If a scientist uses an LLM to write a paper with fabricated citations - that’s a crappy scientist.

AI is not the problem, laziness and negligence is. There needs to be serious social consequences to this kind of thing, otherwise we are tacitly endorsing it.

show 37 replies
chistevyesterday at 5:46 PM

Last month, I was listening to the Joe Rogan Experience episode with guest Avi Loeb, who is a theoretical physicist and professor at Harvard University. He complained about the disturbingly increasing rate at which his students are submitting academic papers referencing non-existent scientific literature that were so clearly hallucinated by Large Language Models (LLMs). They never even bothered to confirm their references and took the AI's output as gospel.

https://www.rxjourney.net/how-artificial-intelligence-ai-is-...

show 3 replies
jameshartyesterday at 2:50 PM

Is the baseline assumption of this work that an erroneous citation is LLM hallucinated?

Did they run the checker across a body of papers before LLMs were available and verify that there were no citations in peer reviewed papers that got authors or titles wrong?

show 5 replies
TaupeRangeryesterday at 2:51 PM

It's going to be even worse than 50:

> Given that we've only scanned 300 out of 20,000 submissions, we estimate that we will find 100s of hallucinated papers in the coming days.

show 1 reply
currymjyesterday at 11:36 PM

I recommend actually clicking through and reading some of these papers.

Most of those I spot checked do not give an impression of high quality. Not just AI writing assistance but many seem to have AI-generated "ideas", often plausible nonsense. the reviewers often catch the errors and sometimes even the fake citations.

can I prove malfeasance beyond a reasonable doubt? no. but I personally feel quite confident many of the papers I checked are primarily AI-generated.

I feel really bad for any authors who submitted legitimate work but made an innocent mistake in their .bib and ended up on the same list as the rest of this stuff.

show 1 reply
senshanyesterday at 7:50 PM

As many pointed out, the purpose of peer review is not linting, but the assessment of the novelty and subtle omissions.

Which incentives can be set to discourage the negligence?

How about bounties? A bounty fund set up by the publisher and each submission must come with a contribution to the fund. Then there be bounties for gross negligence that could attract bounty hunters.

How about a wall of shame? Once negligence crosses a certain threshold, the name of the researcher and the paper would be put on a wall of shame for everyone to search and see?

show 1 reply
Isamuyesterday at 3:15 PM

Someone commented here that hallucination is what LLMs do, it’s the designed mode of selecting statistically relevant model data that was built on the training set and then mashing it up for an output. The outcome is something that statistically resembles a real citation.

Creating a real citation is totally doable by a machine though, it is just selecting relevant text, looking up the title, authors, pages etc and putting that in canonical form. It’s just that LLMs are not currently doing the work we ask for, but instead something similar in form that may be good enough.

show 1 reply
noodlesUKyesterday at 8:44 PM

It astonishes me that there would be so many cases of things like wrong authors. I began using a citation manager that extracted metadata automatically (zotero in my case) more than 15 years ago, and can’t imagine writing an academic paper without it or a similar tool.

How are the authors even submitting citations? Surely they could be required to send a .bib or similar file? It’s so easy to then quality control at least to verify that citations exist by looking up DOIs or similar.

I know it wouldn’t solve the human problem of relying on LLMs but I’m shocked we don’t even have this level of scrutiny.

show 1 reply
dclowd9901yesterday at 3:19 PM

To me, this is exactly what LLMs are good for. It would be exhausting double checking for valid citations in a research paper. Fuzzy comparison and rote lookup seem primed for usage with LLMs.

Writing academic papers is exactly the _wrong_ usage for LLMs. So here we have a clear cut case for their usage and a clear cut case for their avoidance.

show 3 replies
MarkusQyesterday at 3:51 PM

This is as much a failing of "peer review" as anything. Importantly, it is an intrinsic failure, which won't go away even if LLMs were to go away completely.

Peer review doesn't catch errors.

Acting as if it does, and thus assuming the fact of publication (and where it was published) are indicators of veracity is simply unfounded. We need to go back to the food fight system where everyone publishes whatever they want, their colleagues and other adversaries try their best to shred them, and the winners are the ones that stand up to the maelstrom. It's messy, but it forces critics to put forth their arguments rather than quietly gatekeeping, passing what they approve of, suppressing what they don't.

show 5 replies
ricardobeattoday at 2:09 AM

One of the reported hallucinations in this work [1], starting with David Rein, says the other authors are entirely made up. They are indeed absent from the original cited paper [2], but a Google search shows some of the same names featured in citations from other papers [3] [4].

Most of the names in these wrong attributions are actual people though, not hallucinations. What is going on? Is this a case of AI-powered citation management creating some weird feedback loop?

[1] https://app.gptzero.me/documents/54c8aa45-c97d-48fc-b9d0-d49...

[2] https://arxiv.org/pdf/2311.12022

[3] https://arxiv.org/html/2509.22536v3

[4] https://arxiv.org/html/2511.01191v1

thruifgguh585yesterday at 3:51 PM

> crushed by an avalanche of submissions fueled by generative AI, paper mills, and publication pressure.

Run of the mill ML jobs these days ask for "papers in NeurIPS ICLR or other Tier-1 conferences".

We're well past Goodhart's law when it comes to publications.

It was already insane in CS - now it's reached asylum levels.

show 1 reply
ineedasernameyesterday at 4:43 PM

How can someone not be aware, at this point, that— sure- use the systems for finding and summarizing research, but for each source, take 2 minutes to find the source and verify?

Really, this isn’t that hard and it’s not at all an obscure requirement or unknown factor.

I think this is much much less “LLMs dumbing things down” and significantly more just a shibboleth for identifying people that were already nearly or actually doing fraudulent research anyway. The ones who we should now go back and look at prior publications as very likely fraudulent as well.

btislertoday at 12:37 AM

I’ve been working on tools that specifically address this problem, but from the level upstream of citation. They don’t check whether a citation exists — instead they measure whether the reasoning pathway leading to a citation is stable, coherent, and free of the entropy patterns that typically produce hallucinations.

The idea is simple: • Bad citations aren’t the root cause. • They are a late-stage symptom of a broken reasoning trajectory. • If you detect the break early, the hallucinated citation never appears.

The tools I’ve built (and documented so anyone can use) do three things: 1. Measure interrogative structure — they check whether the questions driving the paper’s logic are well-formed and deterministic. 2. Track entropy drift in the argument itself — not the text output, but the structure of the reasoning. 3. Surface the exact step where the argument becomes inconsistent — which is usually before the fake citation shows up.

These instruments don’t replace peer review, and they don’t make judgments about culture or intent. They just expose structural instability in real time — the same instability that produces fabricated references.

If anyone here wants to experiment or adapt the approach, everything is published openly with instructions. It’s not a commercial project — just an attempt to stabilize reasoning in environments where speed and tool-use are outrunning verification.

Code and instrument details are in my CubeGeometryTest repo (the implementation behind ‘A Geometric Instrument for Measuring Interrogative Entropy in Language Systems’). https://github.com/btisler-DS/CubeGeometryTest This is still a developing process.

Ekarosyesterday at 4:01 PM

One wonders why this has not been largely fully automated. If we track those citations anyway. Surely we have database of them and most of them are easily matched there. So only outliers need to be checked either as new latest papers or mistakes which should be close enough to something or real fakes.

Maybe there just is no incentive for this type of activity.

show 3 replies
mjdyesterday at 3:07 PM

I love that fake citation that adds George Costanza to the list of authors!

neilvyesterday at 3:37 PM

https://blog.iclr.cc/2025/11/19/iclr-2026-response-to-llm-ge...

> Papers that make extensive usage of LLMs and do not disclose this usage will be desk rejected.

This sounds like they're endorsing the game of how much can we get away with, towards the goal of slipping it past the reviewers, and the only penalty is that the bad paper isn't accepted.

How about "Papers suspected of fabrications, plagiarism, ghost writers, or other academic dishonesty, will be reported to academic and professional organizations, as well as the affiliated institutions and sponsors named on the paper"?

show 1 reply
upofadownyesterday at 4:42 PM

If you are searching for references with plausible sounding titles then you are doing that because you don't want to have to actually read those references. After all if you read them and discover that one or more don't support your contention (or even worse, refutes it) then you would feel worse about what you are doing. So I suspect there would be a tendency to completely ignore such references and never consider if they actually exist.

LLMs should be awesome at finding plausible sounding titles. The crappy researcher just has to remember to check for existence. Perhaps there is a business model here, bogus references as a service, where this check is done automatically.

knallfroschyesterday at 9:53 PM

And these are just the citations that any old free tool could have included via Bibtex link from the website?

Not only is that incredibly easy to verify (you could pay a first semester student without any training), it's also a worrying sign on what the paper's authors consider quality. Not even 5 minutes spent to get the citations right!

You have to wonder what's in these papers.

pamayesterday at 5:49 PM

Given how many errors I have seen in my years as a reviewer from well before the time of AI tools, it would be very surprizing if 99.75% of the ~20,000 submitted papers to didnt have such errors. If the 300 sample they used was truly random, then 50 of 300 sounds about right compared to errors I had seen starting in the 90s when people manually curated bintex entries. It is the author’s and editor’s job, not the reviewer’s, to fix the citations.

simonwyesterday at 5:18 PM

I'm finding the GPTZero share links difficult to understand. Apparently this one shows a hallucinated citation but I couldn't understand what it was trying to tell me: https://app.gptzero.me/documents/9afb1d51-c5c8-48f2-9b75-250...

(I'm on mobile, haven't looked on desktop.)

leocyesterday at 4:00 PM

Ah, yes: meta-level model collapse. Very good, carry on.

godelskitoday at 1:04 AM

In case people missed it there's some additional important context:

  - Major AI conference flooded with peer reviews written by AI 
      https://news.ycombinator.com/item?id=46088236
  - "All OpenReview Data Leaks" 
    https://news.ycombinator.com/item?id=46073488
    - "The Day Anonymity Died: Inside the OpenReview / ICLR 2026 Leak" 
      https://news.ycombinator.com/item?id=46082370
    - More about the leak
      https://forum.cspaper.org/topic/191/iclr-i-can-locate-reviewer-how-an-api-bug-turned-blind-review-into-a-data-apocalypse
The second one went under the radar, but basically OpenReview left the API open so you didn't need credentials. This meant all reviewers and authors were deanonymized across multiple conferences.

All these links are for ICLR too, which is the #2 ML conference for those that don't know.

And for some important context of the link for this post, note that they only sampled 300 papers and found 50. It looks to be almost exclusively citations but those are probably the easiest things to verify.

And this week CVPR sent out notifications that OpenReview will be down between Dec 6th and Dec 9th. No explanation for why.

So we have reviewers using LLMs, authors using LLMs, and idk the conference systems writing their software with LLMs? Things seem pretty fragile right now...

I think at least this article should highlight one of the problems we have in academia right now (beyond just ML, though it is more egregious there): citation mining. It is pretty standard to have over 50 citations in your 10 page paper these days. You can bet that most of these are not going to be for the critical claims but instead heavily placed in the background section. I looked at a few of the papers and everyone I looked at had their hallucinated citations in background (or background in appendix) sections. So these are "filler" citations, which I think illustrates a problem: citations are being abused. I mean the metric hacking should be pretty obvious if you just look at how many citations ML people have. It's grown exponentially! Do we really need so many citations? I'm all for giving people credit but a hyper-fixation on citation count as our measure of credit just doesn't work. It's far too simple of a metric. Like we might as well measure how good of a coder you are by the number of lines of code you produce[0].

It really seems that academia doesn't scale very well...

[0] https://www.youtube.com/shorts/rDk_LsON3CM

wohoefyesterday at 6:49 PM

Tools like GPTzero are incredibly unreliable. Me and plently of my colleagues often get our writing flagged as 100% AI by these tools, when no AI was used.

hyperpapeyesterday at 3:21 PM

It's awful that there are these hallucinated citations, and the researchers who submitted them ought to be ashamed. I also put some of the blame on the boneheaded culture of academic citations.

"Compression has been widely used in columnar databases and has had an increasing importance over time.[1][2][3][4][5][6]"

Ok, literally everyone in the field already knows this. Are citations 1-6 useful? Well, hopefully one of them is an actually useful survey paper, but odds are that 4-5 of them are arbitrarily chosen papers by you or your friends. Good for a little bit of h-index bumping!

So many citations are not an integral part of the paper, but instead randomly sprinkled on to give an air of authority and completeness that isn't deserved.

I actually have a lot of respect for the academic world, probably more than most HN posters, but this particular practice has always struck me as silly. Outside of survey papers (which are extremely under-provided), most papers need many fewer citations than they have, for the specific claims where the paper is relying on prior work or showing an advance over it.

show 1 reply
obscuretteyesterday at 4:21 PM

That's what I'm really afraid of – we will be drowning in the AI slop as a society and we'll loose the most important thing that made free and democratic society possible - a trust. People just don't tust anyone and/or anything any more. And the lack of trust, especially in scale, is very expensive.

show 1 reply
VerifiedReportsyesterday at 5:29 PM

Fabricated, not "hallucinated."

jqpabc123yesterday at 1:35 PM

The legal system has a word to describe AI "slop" --- it is called "negligence".

And as the remedy starts being applied (aka "liability"), the enthusiasm for AI will start to wane.

I wouldn't be surprised if some businesses ban the use of AI --- starting with law firms.

show 2 replies
gedyyesterday at 3:18 PM

The issue is there are incentives for more quantity and not quality in modern science (well more like academia), so people will use tools to pump stuff out. It'll get worse as academic jobs tighten due.

rdiddlyyesterday at 5:45 PM

So papers and citations are created with AI, and here they're being reviewed with AI. When they're published they'll be read by AI, and used to write more papers with AI. Pretty soon, humans won't need to be involved at all, in this apparently insufferable and dreary business we call science, that nobody wants to actually do.

tomrodyesterday at 3:07 PM

How sloppy is someone that they don't check their references!

show 1 reply
michaelcampbellyesterday at 3:52 PM

After an interview with Cory Doctorow I saw recently, I'm going to stop anthropomorphizing these things by calling them "hallucinations". They're computers, so these incidents are just simply Errors.

show 5 replies
4bppyesterday at 7:32 PM

Once upon a time, in a more innocent age, someone made a parody (of an even older Evangelical propaganda comic [1]) that imputed an unexpected motivation to cultists who worship eldritch horrors: https://www.entrelineas.org/pdf/assets/who-will-be-eaten-fir...

It occurred to me that this interpretation is applicable here.

[1] https://en.wikipedia.org/wiki/Chick_tract

exasperaitedyesterday at 5:35 PM

Every single person who did this should be censured by their own institutions.

Do it more than once? Lose job.

End of story.

show 1 reply
shusakuyesterday at 2:52 PM

Checking each citation one by one is quite critical in peer review, and of course checking a colleagues paper. I’ve never had to deal with AI slop, but you’ll definitely see something cited for the wrong reason. And just the other day during the final typesetting of a paper of mine I found the journal had messed up a citation (same journal / author but wrong work!)

show 1 reply
benbojanglesyesterday at 4:41 PM

How to get to the top of you are not smart enough?

cratermoonyesterday at 5:28 PM

I believe we discussed this last week, for a different vendor. https://news.ycombinator.com/item?id=46088236

Headline should be "AI vendor’s AI-generated analysis claims AI generated reviews for AI-generated papers at AI conference".

h/t to Paul Cantrell https://hachyderm.io/@inthehands/115633840133507279

jordanpgyesterday at 4:47 PM

Does anyone know, from a technical standpoint, why are citations such a problem for LLMs?

I realize things are probably (much) more complicated than I realize, but programmatically, unlike arbitrary text, citations are generally strings with a well-defined format. There are literally "specs" for citation formats in various academic, legal, and scientific fields.

So, naively, one way to mitigate these hallucinations would be identify citations with a bunch of regexes, and if one is spotted, use the Google Scholar API (or whatever) to make sure it's real. If not, delete it or flag it, etc.

Why isn't something like this obvious solution being done? My guess is that it would slow things down too much. But it could be optional and it could also be done after the output is generated by another process.

show 1 reply
peppersghost93yesterday at 4:21 PM

I sincerely hope every person who has invested money in these bullshit machines loses every cent they've got to their name. LLMs poison every industry they touch.

mlmonkeyyesterday at 5:35 PM

"Given that we've only scanned 300 out of 20,000 submissions"

Fuck! 20,000!!

watwutyesterday at 2:37 PM

Can we just call them "lies" and "fabrications" which is what they are? If I write the same, you will call them "made up citations" and "academic dishonesty".

One can use AI to help them write without going all the way to having it generate facts and citations.

show 4 replies
kalugayesterday at 8:01 PM

[dead]

kalugayesterday at 4:15 PM

[dead]

YouAreWRONGtooyesterday at 3:11 PM

[dead]

teekertyesterday at 3:20 PM

Thanx AI, for exposing this problem that we knew was there, but could never quite prove.

saimiamyesterday at 5:02 PM

Just today, I was working with ChatGPT to convert Hinduism's Mimamsa School's hermeneutic principles for interpreting the Vedas into custom instructions to prevent hallucinations. I'll share the custom instructions here to protect future scientists for shooting themselves in the foot with Gen AI.

---

As an LLM, use strict factual discipline. Use external knowledge but never invent, fabricate, or hallucinate. Rules: Literal Priority: User text is primary; correct only with real knowledge. If info is unknown, say so. Start–End Coherence: Keep interpretation aligned; don’t drift. Repetition = Intent: Repeated themes show true focus. No Novelty: Add no details without user text, verified knowledge, or necessary inference. Goal-Focused: Serve the user’s purpose; avoid tangents or speculation. Narrative ≠ Data: Treat stories/analogies as illustration unless marked factual. Logical Coherence: Reasoning must be explicit, traceable, supported. Valid Knowledge Only: Use reliable sources, necessary inference, and minimal presumption. Never use invented facts or fake data. Mark uncertainty. Intended Meaning: Infer intent from context and repetition; choose the most literal, grounded reading. Higher Certainty: Prefer factual reality and literal meaning over speculation. Declare Assumptions: State assumptions and revise when clarified. Meaning Ladder: Literal → implied (only if literal fails) → suggestive (only if asked). Uncertainty: Say “I cannot answer without guessing” when needed. Prime Directive: Seek correct info; never hallucinate; admit uncertainty.

show 1 reply