This seems a bit like a needlessly publicized finding. Surely our baseline assumption is that there are lots of systems that aren't very good at finding cancer. We're interested in findings that are good. You only need 1 good system to adopt. Yes, it's good scientific hygiene to do the study and publish it going "Well, this particular thing isn't good let's move on". But my expectation is you just going until you design a system that does do well and then adopt that system.
If I pluck a guy off the street, get him to analyze a load of MRI scans and he doesn't correctly identify cancer from them I'm not going to publish an article saying "Humans miss X% of breast cancers" am I.
I've been adjacent to this field for a while, so take this for what it is. My understanding that the developing a system that can accurately identify a specific form or sub-form of cancer to a degree equal or better than a human is doable now. However, developing a system that can generalize to many forms of cancer is not.
Why does this matter? Because procurement in the medical world is a pain in the ass. And no medical center wants to be dealing with 32 different startups each selling their own specific cancer detection tool.
Many people are confused and think the Bitter Lesson is that you can just feed up a bigger and bigger model and eventually it becomes omnipotent.
I think finding that AI or at least specific model sold to be able to do something can't reasonable do it is entirely reasonable thing to publish.
In the end it is on the model marketer to prove that what they sell can do what it says. And counter examples is fully valid thing to then release.