AI voices have gotten scarily good. They are easy to recognize because most creators use the same voices with the same intonations and don't care to cut out the mistakes. But if you don't recognize the voice it takes a couple sentences to discern that it's AI even with an ear trained on the difference.
But it is funny to see how much stuff gets uploaded with zero quality control and still gets traction. These models really don't deal will with "innocent" letter substitutions, Iike using I instead of l.
I've heard enough slop using the ElevenLabs voices that I can recognize them almost immediately now. But you're right. Higher end models with less familiar voices are harder to notice. One consistent failing is that they are always too perfect. No mistakes or signs of cuts to edit out where a human VA would have made a mistake. Its all very smooth and perfect. As if they nailed it in the first shot. Once the cheap/free models manage to fix that then we are in real trouble. Also, some really lazy slop creators don't bother to fix issues with pronunciation. But that's not the fault of the model really.