Show HN: Sparrow-1 – Audio-native model for human-level turn-taking without ASR

123 points • by code_brian • 01/14/2026 • 49 comments • view on HN

For the past year I've been working to rethink how AI manages timing in conversation at Tavus. I've spent a lot of time listening to conversations. Today we're announcing the release of Sparrow-1, the most advanced conversational flow model in the world.

Some technical details:

- Predicts conversational floor ownership, not speech endpoints

- Audio-native streaming model, no ASR dependency

- Human-timed responses without silence-based delays

- Zero interruptions at sub-100ms median latency

- In benchmarks Sparrow-1 beats all existing models at real world turn-taking baselines

I wrote more about the work here: https://www.tavus.io/post/sparrow-1-human-level-conversation...

Comments

cuuupid • 01/15/2026

The first time I met Tavus, their engineers (incl Brian!) were perfectly willing to sit down and build their own better Infiniband to get more juice out of H100s. There is pretty much nobody working on latency and realtime at the level they are, Sparrow-1 would be an defining achievement for most startups but will just be one of dozens for Tavus :)

➕ show 1 reply

ljoshua • 01/15/2026

Hey @code_brian, would Tavus make the conversational audio model available outside of the PALs and video models? Seems like this could be a great use case for voice-only agents as well.

➕ show 1 reply

randyburden • 01/15/2026

Awesome. We've been using Sparrow-0 in our platform since launch, and I'm excited to move to Sparrow-1 over the next few days. Our training and interview pre-screening products rely heavily on Tavus's AI avatars, and this upgrade (based on the video in your blog post) looks like it addresses some real pain points we've run into. Really nice work.

➕ show 1 reply

arkobel • 01/19/2026

Have you compared with Krisp-TT models? https://krisp.ai/blog/krisp-turn-taking-v2-voice-ai-viva-sdk... Krisp LLC also shares an End-of-Turn Test dataset. Did you test your model on that? https://huggingface.co/datasets/Krisp-AI/turn-taking-test-v1

And can you share some information about the model size and FLOPS?

dfajgljsldkjag • 01/15/2026

I am always skeptical of benchmarks that show perfect scores, especially when they come from the company selling the product. It feels like everyone claims to have solved conversational timing these days. I guess we will see if it is actually any good.

➕ show 2 replies

nextaccountic • 01/15/2026

> Non-verbal cues are invisible to text: Transcription-based models discard sighs, throat-clearing, hesitation sounds, and other non-verbal vocalizations that carry critical conversational-flow information. Sparrow-1 hears what ASR ignores.

Could Sparrow instead be used to produce high quality transcription that incorporate non-verbal cues?

Or even, use Sparrow AND another existing transcription/ASR thing to augment the transcription with non-verbal cues

➕ show 1 reply

nubg • 01/15/2026

Btw while I think this is cool and useful for real time voice interfaces for the general populace, I wonder if for professional users (eg a dev coding by dictating all day), a simple push to talk is not always going to be superior, because you can make long pauses while you think about something, this would creep out a human, but the AI would wait patiently for your push to talk.

➕ show 1 reply

krautburglar • 01/15/2026

Such things were doing a good-enough job scamming the elderly as it is--even with the silence-based delays.

➕ show 1 reply

pugio • 01/15/2026

It sounds really cool, but I don't see any way of trying the model directly. I don't actually want a "Persona" or "Replica" - I just want to use the sparrow-one model. Is there any way to just make API calls to that model directly?

nubg • 01/15/2026

Any examples available? Sounds amazing.

➕ show 1 reply

allan_s • 01/15/2026

How does it compare with https://github.com/KoljaB/RealtimeVoiceChat , which is absent of the benchmark ?

➕ show 2 replies

orliesaurus • 01/15/2026

Literally no way to sign up to try. Put my email and password and it puts me into some wait list despite the video saying I could try the model today. That's what makes me mad about these kind of releases is that the marketing and the product don't talk together.

➕ show 1 reply

sourcetms • 01/15/2026

How do I try the demo for Sparrow-1? What is pricing like?

➕ show 1 reply

ttul • 01/15/2026

I tried talking to Claude today. What a nightmare. It constantly interrupts you. I don’t mind if Claude wants to spend ten seconds thinking about its reply, but at least let ME finish my thought. Without decent turn-taking, the AI seems impolite and it’s just an icky experience. I hope tech like this gets widely distributed soon because there are so many situations in which I would love to talk with a model. If only it worked.

➕ show 5 replies

mentalgear • 01/15/2026

Metric | Sparrow-1 Precision 100% Recall 100%

Common ...

➕ show 2 replies

vpribish • 01/15/2026

What is "ASR" - automatic speech recognition?

➕ show 1 reply

alt Hacker News

Show HN: Sparrow-1 – Audio-native model for human-level turn-taking without ASR

Comments