Yuck, this is going to really harm scientific research.
There is already a problem with papers falsifying data/samples/etc, LLMs being able to put out plausible papers is just going to make it worse.
On the bright side, maybe this will get the scientific community and science journalists to finally take reproducibility more seriously. I'd love to see future reporting that instead of saying "Research finds amazing chemical x which does y" you see "Researcher reproduces amazing results for chemical x which does y. First discovered by z".
> I'd love to see future reporting that instead of saying "Research finds amazing chemical x which does y" you see "Researcher reproduces amazing results for chemical x which does y. First discovered by z".
Most people (that I talk to, at least) in science agree that there's a reproducibility crisis. The challenge is there really isn't a good way to incentivize that work.
Fundamentally (unless you're independent wealthy and funding your own work), you have to measure productivity somehow, whether you're at a university, government lab, or the private sector. That turns out to be very hard to do.
If you measure raw number of papers (more common in developing countries and low-tier universities), you incentivize a flood of junk. Some of it is good, but there is such a tidal wave of shit that most people write off your work as a heuristic based on the other people in your cohort.
So, instead it's more common to try to incorporate how "good" a paper is, to reward people with a high quantity of "good" papers. That's quantifying something subjective though, so you might try to use something like citation count as a proxy: if a work is impactful, usually it gets cited a lot. Eventually you may arrive at something like the H-index, which is defined as "The highest number H you can pick, where H is the number of papers you have written with H citations." Now, the trouble with this method is people won't want to "waste" their time on incremental work.
And that's the struggle here; even if we funded and rewarded people for reproducing results, they will always be bumping up the citation count of the original discoverer. But it's worse than that, because literally nobody is going to cite your work. In 10 years, they just see the original paper, a few citing works reproducing it, and to save time they'll just cite the original paper only.
There's clearly a problem with how we incentivize scientific work. And clearly we want to be in a world where people test reproducibility. However, it's very very hard to get there when one's prestige and livelihood is directly tied to discovery rather than reproducibility.
Reproducibility is overrated and if you could wave a wand to make all papers reproducible tomorrow, it wouldn't fix the problem. It might even make it worse.
https://blog.plan99.net/replication-studies-cant-fix-science...
Have they solved the issue where papers that cite research already invalidated are still being cited?
If there is one thing which scientific reports must require is not using AI to produce the documentation. They can be of the data but not of the source or anything else. AI is a tool, not a replacement for actual work.
For ML/AI/Comp sci articles, providing reproducible code is a great option. Basically, PoC or GTFO.
I heard that most papers in a given field are already not adding any value. (Maybe it depends on the field though.)
There seems to be a rule in every field that "99% of everything is crap." I guess AI adds a few more nines to the end of that.
The gems are lost in a sea of slop.
So I see useless output (e.g. crap on the app store) as having negative value, because it takes up time and space and energy that could have been spent on something good.
My point with all this is that it's not a new problem. It's always been about curation. But curation doesn't scale. It already didn't. I don't know what the answer to that looks like.
> LLMs being able to put out plausible papers is just going to make it worse
If correct form (LaTeX two-column formatting, quoting the right papers and authors of the year etc.) has been allowing otherwise reject-worthy papers to slip through peer review, academia arguably has bigger problems than LLMs.
On the bright side, an LLM can really help set up a reproduction environment.
Perhaps repro should become the basis of peer review?
I think, at least I hope, that a part of the LLM value will be to create their retirement for specific needs. Instead of asking it to solve any problem, restrict the space to a tool that can help you then reach your goal faster without the statistical nature of LLMs.
Maybe it will also change the whole publication as evaluation of science.
Reading the article, this is about CITATIONS which are trivially verifiable.
This is just article publishers not doing the most basic verification failing to notice that the citations in the article don't exist.
What this should trigger is a black mark for all of the authors and their institutions, both of which should receive significant reputational repercussions for publishing fake information. If they fake the easiest to verify information (does the cited work exist) what else are they faking?
It will better expose the behaviour of false scientists.
> to finally take reproducibility more seriously
I've long argued for this, as reproduction is the cornerstone of science. There's a lot of potential ways to do this but one that I like is linking to the original work. Suppose you're looking at the OpenReview page and they have a link for "reproduction efforts" and with at minimum an annotation for confirmation or failure.This is incredibly helpful to the community as a whole. Reproduction failures can be incredibly helpful even when the original work has no fraud. In those cases a reprising failure reveals important information about the necessary conditions that the original work relies on.
But honestly, we'll never get this until we drop the entire notion of "novel" or "impact" and "publish or perish". Novel is in the eye of the reviewer and the lower the reviewer's expertise the less novel a work seems (nothing is novel as a high enough level). Impact can almost never be determined a priori, and when it can you already have people chasing those directions because why the fuck would they not? But publish or perish is the biggest sin. It's one of those ideas that looks nice on paper, like you are meaningfully determining who is working hard and who is hardly working. But the truth is that you can't tell without being in the weeds. The real result is that this stifles creativity, novelty, and impact as it forces researchers to chase lower hanging fruit. Things you're certain will work and can get published. It creates a negative feedback loop as we compete: "X publishes 5 papers a year, why can't you?" I've heard these words even when X has far fewer citations (each of my work had "more impact").
Frankly, I believe fraud would dramatically reduce were researchers not risking job security. The fraud is incentivized by the cutthroat system where you're constantly trying to defend your job, your work, and your grants. They'll always be some fraud but (with a few exceptions) researchers aren't rockstar millionaires. It takes a lot of work to get to point where fraud even works, so there's a natural filter.
I have the same advice as Mervin Kelly, former director of Bell Labs:
How do you manage genius?
You don'tI'd need to see the same scrutiny applied to pre-AI papers. If a field has a poor replication rate, meaning there's a good chance that a given published paper is just so much junk science, is that better or worse than letting AI hallucinate the data in the first place?
In my mental model, the fundamental problem of reproducibility is that scientists have very hard time to find a penny to fund such research. No one wants to grant “hey I need $1m and 2 years to validate the paper from last year which looks suspicious”.
Until we can change how we fund science on the fundamental level; how we assign grants — it will be indeed very hard problem to deal with.