Lol, this essay is missing (or starting from the assumption) that text-to-speech algorithms do a good work, even state of the art.
That is far from correct and the main reason why I would oppose to this is that the AI might incorrectly record something in the transcript that completely derails my diagnosis and treatment.
There's a big difference between:
"I have had nausea for the past three days"
and
"I have not had nausea for the past three days"
And I'm being generous with my example.