You’d use a computer generated transcript as a guide, not as proof - the proof is the recording of the person actually saying the thing, not the LLMs best guess of what it imagined the person saying.
“At timestamp X, person Y said Z” says the robot, and then you dutifully scrub the audio to timestamp X to verify.
Is audio always kept in addition to transcripts? (genuine question, I rarely record either)