Failure cases aren't just "patient died." They also include all the times where ChatGPT's "advice" aligned with their doctor's advice, and when ChatGPT's advice was just totally wrong and the patient correctly ignored it. Nobody knows how numerous these cases are.
So your failure cases are now "it agreed with the doctor" and "the patient correctly identified bad advice."
Where's the failure?