The paper looks like it has a large sample size, but it actually has a sample size of only 48 testers/flippers. Some of the videos of those testers show very low, low-rpm coin tosses, we're talking only 1-2 flips. Where they also flipped thousands of times, presumably in the same way. So there is actually a very small sample size in the study (N = 48), where testers that don't flip properly (low rpm, low height, few coin rotations) can affect the results disproportionately.
Doesn't look like the study author backgrounds are particularly focused on statistics. I would presume with 48 authors (all but 3 of which flipped coins for the study), the role of some might have been more test subject than author. And isn't being the subject in your own study going to introduce some bias? Surely if you're trying to prove to yourself that the coins land on one side or another given some factor, you will learn the technique to do it, especially if you are doing a low-rpm, low flip. Based on the study results, some of the flippers appear to have learned this quite well.
If the flippers (authors) had been convinced of the opposite (fair coins tend to land on the opposite side from which they started) and done the same study, I bet they could have collected data and written a paper with the results proving that outcome.
> testers that don't flip properly
I think that's the point. It shows that people don't usually flip properly, leading to biased results.
The real lesson is probably that if you're skilled enough, and/or train for long enough, you can influence the odds significantly without anyone ever noticing anything.
The paper is an experimental validation of a previous paper that presented a statistical model. The experiment found the exact results predicted by the model. The reason for the non 50/50 result is precession of the coin.
Actually, I think it's more sound to approach this with clustered standard errors. Basic intuition is similar, but the sample size is what it is per person, and your observations aren't independent across draws but are across people.
> only 48 testers/flippers
I assumed they did these coin flips were done using a machine. But I guess they wanted to test if human flippers because they wanted to make claims about the human coin flip phenomenon.
1-2 flips should just invalidate the toss. Anyone in a real scenario upon seeing this would call shenanigans.
We need some minimum flippage for the toss to count.
> the role of some might have been more test subject than author
The reason is because it was used as incentive:
> Intrigued? Join us and earn a co-authorship
Per the linked youtube video.
If you are doing anything with human subjects, even something dumb like having them flip coins for an hour while recording the results, you need approval from your local ethics board.
If you are doing self-experimentation, you do not.
48 "authors" is a bit extreme, but it's the norm to do some light human research with a half dozen authors as the subjects.
testers that don't flip properly
Clearly the coin flips at the beginning of sports fixtures need to be assessed by a panel of highly skilled judges who can pronounce on their validity. We'll also need local, regional, national, and international organizations to train, select, and maintain the quality of coin flipping judges and to maintain the integrity of the discipline while moving forward as new coins are minted and different sorts of flipping styles are proposed by. Membership of such organizations should be limited to those afilliated with the Ancient Order of Coin Flippers.