I'm also working on a Chinese learning app (heyzima.com) and my "solution" to this was to use the TTS token/word log probabilities.