The description from the summaries sound very flawed.
1. They only tested 2 Radiologists. And they compared it to one model. Thus the results don’t say anything about how Radiologists in general perform against AI in general. The most generous thing the study can say is that 2 Radiologists outperformed a particular model.
2. The Radiologists were only given one type of image, and then only for those patients that were missed by the AI. The summaries don’t say if the test was blind. The study has 3 authors, all of which appear to be Radiologists, and it mentions 2 Radiologists looked at the ai-missed scans. This raises questions about whether the test was blind or not.
Giving humans data they know are true positives and saying “find the evidence the AI missed” is very different from giving an AI model also trained to reduce false positives a classification task.
Humans are very capable at finding patterns (even if they don’t exist) when they want to find a pattern.
Even if the study was blind initially, trained humans doctors would likely quickly notice that the data they are analyzing is skewed.
Even if they didn’t notice, humans are highly susceptible to anchoring bias.
Anchoring bias is a cognitive bias where individuals rely too heavily on the first piece of information they receive (the "anchor") when making subsequent judgments or decisions.
They skewed nature or the data has a high potential to amplify any anchoring bias.
If the experiment had controls, any measurement error resulting from human estimation errors could potentially cancel out (a large random sample of either images or doctors should be expected to have the same estimation errors in each group). But there were no controls at all in the experiment, and the sample size was very small. So the influence of estimation biases on the result could be huge.
From what I can read in the summary, these results don’t seem reliable.
Am I missing something?
This article is about measuring how often an AI missed cancer by giving it data only where we know there was cancer.
> Am I missing something?
Yes. The article is not about AI performance vs human performance.
> Humans are very capable at finding patterns (even if they don’t exist) when they want to find a pattern
Ironic
The did NOT test radiologists. There were NO healthy controls. They evaluated AI false negative rate and used exclusively unblinded radiologists to grade the level of visibility and other features of the cancer.
Utility of the study is to evaluate potential AI sensitivity if used for mass fully automated screenings using mammography data. But says NOTHING about the CRUCIAL false positive rate (no healthy controls) and NOTHING about AI vs. human performance.
See my main comment elsewhere in this threat.