My personal opinion: AI should be still kept out of anything mission critical, in all stages, except for evaluation.
There is other comment very correctly noting that this result is on 100% positive input. Same AI in “real life” would score probably much better eventually. But as you point out, if used as a confirmation tool, is definitely bad.
> Same AI in “real life” would score probably much better eventually
Either I don't understand your reasoning or you are very much wrong. A "real life" dataset would contain real negatives too and the result would be equal if false positive rate was zero and strictly worse if the rate was any higher. One should expect the same AI to score significantly worse in a real life setting.