The caption highlight timing is very inaccurate. It looks like it just steps through each word on a fixed timer, rather than using timing information from the TTS engine?
Yes just fixed timer, and using browser TTS nothing fancy here on purpose - when I did some research on tiktok videos generally simpler/worse quality seemed to be better XD
Yes just fixed timer, and using browser TTS nothing fancy here on purpose - when I did some research on tiktok videos generally simpler/worse quality seemed to be better XD