This reminds me of the early days of applying speech recognition. Some use cases were surprisingly good, like non-pretrained company directory name recognition. Shockingly good and it fails soft because there are a small number of possible alternative matches.
Other cases, like games where the user's voice changes due to excitement/stress, were incredibly bad.