logoalt Hacker News

blopkertoday at 2:08 AM0 repliesview on HN

There are still a few unsolved problems that require tuning for specific applications. Applications that own the video call have a much easier time, they have access to each individual audio stream. Applications like this, however, have to deal with overlapping voices from a single stream. If it's trying to attribute each utterance to an individual, separating the voices is tough, or can lead to confusing transcripts. There are many little problems like this which make it a tough problem in real world usage. Domain specific terms, or proper nouns is another source of inaccuracy.