that example of the radiologist review cases touches on one worry i have about automation with human-in-the-loop for safety. specifically that a human in the loop wont work as a safeguard unless they are meaningfully engaged beyond being a simple reviewer.
how do you sustain attention and thoughtfully review radiological scans when 99% of the time you agree with the automated assessment? i'm pretty sure that no matter how well trained the doctor is they will end up just spamming "LGTM" after a while.
The likelihood is that models will "box" questionable stuff for radiologist review, and the boxing threshold will probably be set low enough that radiologists stay sharp (though we probably won't do this at first and skills may atrophy for a bit).
This is also a free source training data over time so market incentives are there.
I have the same question about minor legislative amendments a certain agency keeps requesting in relation to its own statutory instrument. Obviously they are going to be passed without much scrutiny, they all seem small and the agency is pretty trustworthy.
(this is an unsolved problem that exists in many domains from long before AI)
This is a software problem! You can make the job be more engaging by sneaking in secret lies to see if the human is paying attention.
How to you sustain attention of the other big X-ray use: security scanning? Most scanners will never see a bomb, so how do you ensure that they'll actually see one when it does happen?
The answer they've come up with is periodic tests and audits.