logoalt Hacker News

nextaccountictoday at 6:07 AM0 repliesview on HN

> Non-verbal cues are invisible to text: Transcription-based models discard sighs, throat-clearing, hesitation sounds, and other non-verbal vocalizations that carry critical conversational-flow information. Sparrow-1 hears what ASR ignores.

Could Sparrow instead be used to produce high quality transcription that incorporate non-verbal cues?

Or even, use Sparrow AND another existing transcription/ASR thing to augment the transcription with non-verbal cues