logoalt Hacker News

derbOacyesterday at 6:52 PM1 replyview on HN

I agree completely.

My own experience in watching citation patterns, not even with things that I've worked on, is that certain authors or groups attract attention for an idea or result for all kinds of weird reasons, and that drives citation patterns, even when they're not the originator of the results or ideas. This leads to weird patterns, like the same results before a certain "popular" paper being ignored even when the "popular" paper is incredibly incremental or even a replication of previous work; sometimes previous authors discussing the same exact idea, even well-known ones, are forgotten in lieu of a newer more charismatic author; various studies have shown that retracted zombie papers continue to be cited at high rates as if they were never retracted; and so forth and so on.

I've kind of given up trying to figure out what accounts for this. Most of the time it's just a kind of recency availability bias, where people are basically lazy in their citations, or rushed for time, or whatever. Sometimes it's a combination of an older literature simply being forgotten, together with a more recent author with a lot of notoriety for whatever reason discussing the idea. Lots of times there's this weird cult-like buzz around a person, more about their personality or presentation than anything else — as in, a certain person gets a reputation as being a genius, and then people kind of assume whatever they say or show hasn't been said or shown before, leading to a kind of self-fulfilling prophecy in terms of patterns of citations. I don't even think it matters that what they say is valid, it just has to garner a lot of attention and agreement.

In any event, in my field I don't attribute a lot to researchers being famous for any reason other than being famous. The Matthew effect is real, and can happen very rapidly, for all sorts of reasons. People also have a short attention span, and little memory for history.

This is all especially true of more recent literature. Citation patterns pre-1995 or so, as is the case with those Wikipedia citations, are probably not representative of the current state.


Replies

cubefoxyesterday at 8:24 PM

Yeah. One example of people mindlessly mass citing some random paper is this: Chain of thought (CoT) prompting was used in the past to greatly enhance the reasoning ability of LLMs. Usually this paper is cited when CoT is discussed:

https://arxiv.org/abs/2201.11903

It has over 20,000 citations according to Google Scholar. But clearly the technique was not invented by these authors. It was known 1.5 years earlier, just after GPT-3 came out:

https://xcancel.com/kleptid/status/1284069270603866113#m

Perhaps even longer. But the paper above is cited nonetheless. Probably because there is pressure to cite something and the title of that paper sounds like they pioneered it. I doubt many people who cite it have even read it.

Another funny example is that in machine learning and some other fields, a success measure named "Matthews Correlation Coefficient" (MCC) is used. It's named after some biochemist, Brian Matthews, who used it in a paper from 1975. Needless to say, he didn't invent it at all, he just used the well-known binary version of the well-known correlation coefficient. People who named the measure "MCC" apparently thought he invented it. Matthews probably just didn't bother to cite any sources himself.