logoalt Hacker News

AI misses nearly one-third of breast cancers, study finds

136 pointsby Liquidityyesterday at 6:43 AM69 commentsview on HN

Comments

directevolveyesterday at 8:23 AM

The point of this study is that it suggests that fully AI-automated mammography can currently deliver 70% sensitivity in detecting breast cancer using this model. It does not enable us to compare AI to unaided human performance. As this study did not include healthy controls, there is no false positive rate. The false positive rate is a crucial missing metric, since the vast majority of women do not have breast cancer.

In nearly half the false negatives from both the mammogram and DWI datasets, the cancer was categorized as occult by two breast radiologists, meaning the cancer was invisible to a trained eye. The AI model's non-occult false negative rate on the mammography data is 19.3%.

For that 19.3% figure, see Table 2: 68 non-occult in AI-missed cancer, 285 non-occult in AI-detected cancer.

This study did not compare the AI to a radiologist on a mixed set of healthy and cancer images.

levocardiayesterday at 7:06 AM

The original study: https://link.springer.com/article/10.1007/s11547-025-02161-1

It was retrospective-only, i.e. a case series on women who were known to have breast cancer, so there were zero false negatives and zero true negatives, because all patients in the study truly had cancer.

The AI system used was a ConvNet used commercially circa 2021, which is when the data for this case series were collected.

show 4 replies
sfinkyesterday at 8:01 AM

The title bothers me. It suggests to me that "AI" is a single thing. If two guys are tested and turn out to be not that great at reading MRI images, should the headline be "Male radiologists miss nearly one-third of breast cancers"?

If it said "AI something", I'd be fine with it. It's a statement about that something, not about AI in general. Use it as an adjective (short for "AI-using" I guess?), not a noun.

show 1 reply
Trasteryesterday at 10:26 AM

This seems a bit like a needlessly publicized finding. Surely our baseline assumption is that there are lots of systems that aren't very good at finding cancer. We're interested in findings that are good. You only need 1 good system to adopt. Yes, it's good scientific hygiene to do the study and publish it going "Well, this particular thing isn't good let's move on". But my expectation is you just going until you design a system that does do well and then adopt that system.

If I pluck a guy off the street, get him to analyze a load of MRI scans and he doesn't correctly identify cancer from them I'm not going to publish an article saying "Humans miss X% of breast cancers" am I.

show 3 replies
samrusyesterday at 4:32 PM

Whats the baseline? How many did a human being get? We need to compare this to baseline to know if its good or bad

show 1 reply
swisniewskiyesterday at 8:26 AM

The description from the summaries sound very flawed.

1. They only tested 2 Radiologists. And they compared it to one model. Thus the results don’t say anything about how Radiologists in general perform against AI in general. The most generous thing the study can say is that 2 Radiologists outperformed a particular model.

2. The Radiologists were only given one type of image, and then only for those patients that were missed by the AI. The summaries don’t say if the test was blind. The study has 3 authors, all of which appear to be Radiologists, and it mentions 2 Radiologists looked at the ai-missed scans. This raises questions about whether the test was blind or not.

Giving humans data they know are true positives and saying “find the evidence the AI missed” is very different from giving an AI model also trained to reduce false positives a classification task.

Humans are very capable at finding patterns (even if they don’t exist) when they want to find a pattern.

Even if the study was blind initially, trained humans doctors would likely quickly notice that the data they are analyzing is skewed.

Even if they didn’t notice, humans are highly susceptible to anchoring bias.

Anchoring bias is a cognitive bias where individuals rely too heavily on the first piece of information they receive (the "anchor") when making subsequent judgments or decisions.

They skewed nature or the data has a high potential to amplify any anchoring bias.

If the experiment had controls, any measurement error resulting from human estimation errors could potentially cancel out (a large random sample of either images or doctors should be expected to have the same estimation errors in each group). But there were no controls at all in the experiment, and the sample size was very small. So the influence of estimation biases on the result could be huge.

From what I can read in the summary, these results don’t seem reliable.

Am I missing something?

show 2 replies
titaniumrainyesterday at 4:09 PM

The missed cases should be attributed to the specific model deployed in the product, not to AI as a general concept. Framing this limitation under a broad and alarming title is therefore misleading and unnecessary.

Moosturmyesterday at 7:06 AM

Shouldn't A.I. not be used in a way that it only tries to assist? E.g. a doctor takes a look first and if (s)he can't find anything then A.I. is checking as well (or in parallel).

show 3 replies
docdeekyesterday at 7:09 AM

In the human follow-up, there was an improvement but there was still a gap:

> Their findings offered reassurance: DWI alone identified the majority of cancers the AI had overlooked, detecting 83.5% of missed lesions for one radiologist and 79.5% for the other.

The combination of AI and this DWI methodology seems to identify most of the cancer, but there’s still about 20% of 1/3 that gets missed. I assume that as these were confirmed diagnoses, they were caught with another method beyond DWI.

emil-lpyesterday at 7:03 AM

Please always present the confusion matrix. One number is (almost) useless.

I can detect 100% by

    def detect(x):
        Return True
show 4 replies
ggmyesterday at 7:22 AM

Useful to show the failure rate for humans, and humans assisted by systems.

show 1 reply
joelthelionyesterday at 9:33 AM

"AI" doesn't exist. There are probably hundreds of different breast cancer detection algorithms. Maybe the SOTA isn't good enough yet. That doesn't mean AI in general is fundamentally incapable of correctly detecting it.

nephihahayesterday at 11:12 AM

As someone else has pointed out, I would like to know how this compares to humans.

I hope something good comes out of this, as I have known women whose lives were deeply affected by this.

shevy-javayesterday at 10:47 AM

So basically AI kills people.

This is Skynet 2.0 or 3.0. But shit. James Cameron may have to redo The Terminator, to include AI. Then again, who would watch such a movie?

zqy123007yesterday at 3:09 PM

okay guys, I developed AI mammo screening product, let me clear things up. you read it wrong, and I don't blame you. I doubt whoever wrote this actually have a good understanding of the numbers.

the setup: 1. 400s confirmed patients 2. AI reads Mammography ONLY and missed 1/3 3. on those AI missed patients, radiologists do a second read on MRI, which is the gold standard for differential. evidence: the referenced paper at the bottom <Added value of diffusion-weighted imaging in detecting breast cancer missed by artificial intelligence-based mammography.>

So, the whole point it (or its reference paper) is trying to make is: Mammography sucks, MRI is much better, which is a KNOWN FACT.

Now, let me give you some more background missing from the paper: 1. Why does Mammography suck? well, go google/gpt some images, its essentially X-ray for the breast, which compress 3D volumes into 2D average poole plane, which is infomation lossy. SO, AI or not, the sensitivity is limited by the modality. 2. How bad/good is Mammography AI? I would say 80~85% sensitivity agaist very thorough+experienced radiologist without making unbearable amount of FP, that probably translates to 2/3 sensitivity against real cancer cohert, so the referenced number is about right. 3. Mammography sucks, what's the point? its cheap AND fast, you can probably do walk-in and get interpretation back in hours. Whereas MRI you probably need to schedule 2 weeks ahead if not MORE. For a yearly screening, it works for the majority of polulation.

and final pro tip, 1. breast tumor is more pravelent than you think (maybe 1 in 2 by age 70+) 2. most guides recommend women with 45+ do yearly checkup 3. if you have dense breast (basically small and firm), add ultrasound screening to make sure. 4. breast feeding does good for both the mother and child, do that.

peace & love

show 1 reply
NVHackeryesterday at 11:05 AM

Not all AIs are created equal.

davidguettayesterday at 11:26 AM

This is a terrible article.

"One AI is not great" is not an interesting finding and certainly not conclusive of "AI can't help or do the job".

It's like saying "some dude can't detect breast cancer" and suggesting all humans are useless.

nooberminyesterday at 9:59 AM

The moving of the goal posts is a lot here. Measuring sensitivity only is still a useful thing and at least aids radiologists in their decision to use this model specifically and how to base their reliance on it. Also, why does every study have to compare all humans across human history to some particular model?

andrewstuartyesterday at 6:59 AM

AI finds nearly 2/3rds of breast cancers!

show 2 replies