These same questions could be asked about self driving cars, but they've been shown to be consistently safer drivers than humans. If this guy is getting consistently better results from ai+human than it is from just humans, what would it matter if the former results in errors given the latter results in more and costs more?
TFA's while point is that there is no easy way to tell if LLM output is correct or not. Driving mistakes provide instant feedback if the output of whatever AI is driving is correct or not. Bad comparison.
If the cars weren't considerably safer drivers than humans they wouldn't be allowed on the road. There isn't as much regulation blocking deploying this healthcare solution... until those errors actually start costing hospitals money from malpractice lawsuits (or not), we don't know whether it will be allowed to remain in use.