logoalt Hacker News

vijgauravtoday at 7:12 PM0 repliesview on HN

The 60-minute single-pass transcription is the part that actually matters. Most ASR models chunk audio and you lose speaker continuity across boundaries. If the diarization actually holds up on hour-long recordings without drifting, thats a real solve for podcast and meeting transcription workflows.