Pocket TTS: A high quality TTS that gives your CPU a voice

635 points • by pain_perdu • 01/15/2026 • 158 comments • view on HN

Comments

derHackerman • 01/16/2026

I read this, then realized I needed a browser extension to read my long case study and made a browser interface of this and put this together:

https://github.com/lukasmwerner/pocket-reader

➕ show 1 reply

armcat • 01/15/2026

Oh this is sweet, thanks for sharing! I've been a huge fan of Kokoro and event setup my own fully-local voice assistant [1]. Will definitely give Pocket TTS a go!

[1] https://github.com/acatovic/ova

➕ show 2 replies

lukebechtel • 01/15/2026

Nice!

Just made it an MCP server so claude can tell me when it's done with something :)

https://github.com/Marviel/speak_when_done

➕ show 3 replies

singpolyma3 • 01/15/2026

Love this.

It says MIT license but then readme has a separate section on prohibited use that maybe adds restrictions to make it nonfree? Not sure the legal implications here.

➕ show 6 replies

pain_perdu • 01/16/2026

I'm psyched to see so much interest in my post about Kyutai's latest model! I'm working on part of a related team in Paris that's building off Kutai's research to provide enterprise-grade voice solutions. If anyone building in this space I'd love to chat and share some our upcoming models and capabilities that I am told are SOTA. Please don't hesitate to ping me via the address in my profile.

➕ show 2 replies

mgaudet • 01/16/2026

Eep.

So, on my M1 mac, did `uvx pocket-tts serve`. Plugged in

> It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, it was the epoch of belief, it was the epoch of incredulity, it was the season of Light, it was the season of Darkness, it was the spring of hope, it was the winter of despair, we had everything before us, we had nothing before us, we were all going direct to Heaven, we were all going direct the other way—in short, the period was so far like the present period, that some of its noisiest authorities insisted on its being received, for good or for evil, in the superlative degree of comparison only

(Beginning of Tale of Two Cities)

but the problem is Javert skips over parts of sentences! Eg, it starts:

> "It was the best of times, it was the worst of times, it was the age of wisdom, it was the epoch of belief, it was the epoch of incredulity, it was the season of Light, it was the spring of hope, it was the winter of despair, we had everything before us, ..."

Notice how it skips over "it was the age of foolishness,", "it was the winter of despair,"

Which... Doesn't exactly inspire faith in a TTS system.

(Marius seems better; posted https://github.com/kyutai-labs/pocket-tts/issues/38)

➕ show 5 replies

GaggiX • 01/15/2026

I love that everyone is making their own TTS model as they are not as expensive as many other models to train. Also there are plenty of different architecture.

Another recent example: https://github.com/supertone-inc/supertonic

➕ show 4 replies

NoSalt • 01/16/2026

> "You can also clone the voice from any audio sample by using our repo."

Ok, who knows where I can get those high-quality recordings of Majel Barrett' voice that she made before she died?

➕ show 1 reply

dale_glass • 01/16/2026

Is there any TTS engine that doesn't need cloning and has some sort of parameters one can specify?

Like what if I want to graft on TTS to an existing text chat system and give each person an unique, randomly generated voice? Or want to try to get something that's not quite human, like some sort of alien or monster?

➕ show 2 replies

Evidlo • 01/16/2026

How feasible would it be to build this project into a small static binary that could be distributed? The dependencies are pretty big.

➕ show 1 reply

Imustaskforhelp • 01/16/2026

Perhaps I have been not talking to voice models that much or the chatgpt voice always felt weird and off because I was thinking it goes to a cloud server and everything but from Pocket TTS I discovered unmute.sh which is open source and I think is from the same company as Pocket TTS/can I think use Pocket TTS as well

I saw some agentic models at 4B or similar which can punch above its weights or even some basic models. I can definitely see them in the context of home lab without costing too much money.

I think atleast unmute.sh is similar/competed with chatgpt's voice model. It's crazy how good and (effective) open source models are from top to bottom. There's basically just about anything for almost everyone.

I feel like the only true moat might exist in coding models. Some are pretty good but its the only industry where people might pay 10x-20x more for the best (minimax/z.ai subscription fees vs claude code)

It will be interesting to see if we will see another deepseek moment in AI which might beat claude sonnet or similar. I think Deepseek has deepseek 4 so it will be interesting to see how/if it can beat sonnet

(Sorry for going offtopic)

➕ show 1 reply

dust42 • 01/15/2026

Good quality but unfortunately it is single language English only.

➕ show 3 replies

nmstoker • 01/16/2026

It's impressive but it's a shame that it's 2026 and despite remarkably lifelike speech, so many models fall on common issues like heteronyms ("the couple had a row because they couldn't agree where to row their boat"), realistic number handling and so on.

➕ show 1 reply

akx • 01/16/2026

It's pretty good. And for once, a software-engineering-ly high-quality codebase, too!

All too often, new models' codebases are just a dump of code that installs half the universe in dependencies for no reason, etc.

snvzz • 01/15/2026

Relative to AmigaOS translator.device + narrator.device, this sure seems bloated.

Paul_S • 01/16/2026

The speed of improvement of tts models reminds me of early days of Stable Diffusion. Can't wait until I can generate audiobooks without infinite pain. If I was an investor I'd short Audible.

➕ show 5 replies

donpdonp • 01/16/2026

it'd be nice to get some idea of what kind of hardware a laptop needs to be able to run this voice model.

➕ show 1 reply

d4rkp4ttern • 01/16/2026

Super nice and convenient to use as a CLI. I made it into a plugin for Claude Code to give a 1-sentence spoken status update whenever it stops:

claude plugin marketplace add pchalasani/claude-code-tools

claude plugin install voice@cctools-plugins

More here: https://github.com/pchalasani/claude-code-tools?tab=readme-o...

febin • 01/17/2026

I've vibecoded a Rust port of Pocket TTS using candle.

https://github.com/jamesfebin/pocket-tts-candle

The port supports:

- Native compilation with zero Python runtime dependency

- Streaming inference

- Metal acceleration for macOS

- Voice cloning (with the mimi feature)

Note: This was vibecoded (AI-assisted), but features were manually tested.

OfflineSergio • 01/16/2026

This is amazing. The audio feels very natural and it's fairly good at handling complext text to speech tasks. I've been working on WithAudio (https://with.audio). Currently it only uses Kokoros. I need to test this a bit more but I might actually add it to the app. It's too good to be ignored.

➕ show 1 reply

syntaxing • 01/15/2026

Is there something similar for STT? I’m using whisper distill models and they work ok. Sometimes it gets what I say completely wrong.

➕ show 2 replies

smallerfish • 01/16/2026

Hopefully the browsers will improve their built in TTS soon. It's still pretty unusable unless you really need it.

➕ show 1 reply

tschellenbach • 01/15/2026

It's cool how lightweight it is. Recently added support to Vision Agents for Pocket. https://github.com/GetStream/Vision-Agents/tree/main/plugins...

britannio • 01/16/2026

This is impressive but in a sample I tried, it switched language on the second paragraph. I'm on a M4 Pro Macbook.

https://gist.github.com/britannio/481aca8cb81a70e8fd5b7dfa2f...

Zardoz84 • 01/16/2026

I'm missing the old days that connecting a SPOKE256 to the Spectrum and making it speak, looked like magic.

_ache_ • 01/16/2026

It's very impressive! I'm mean, it's better than other <200M TTS models I encounter.

In English, it's perfect and it's so funny in others languages. It sounds exactly like someone who actually doesn't speak the language, but got it anyway.

I don't know why Fantine is just better than the others in others languages. Javer seems to be the worst.

Try Jean in Spanish « ¡Es lo suficientemente pequeño como para caber en tu bolsillo! » sound a lot like they don't understand the language.

Or Azelma in French « C'est suffisament petit pour tenir dans ta poche. » is very good.I mean half of the words are from a Québécois accent, half French one but hey, it's correct French.

Però non capisce l'italiano.

lykahb • 01/16/2026

It'd be great if it supported stdin&stdout for text and wav. Then it could get piped right into afplay

➕ show 1 reply

agentifysh • 01/16/2026

Just added it to my codex plugin that reads summary of what it finishes after each turn and I am spooked! runs well on my macbook, much better than Samantha!

https://github.com/agentify-sh/speak/

➕ show 1 reply

exceptione • 01/16/2026

Question: does anyone recommend a TTS that automatically recognizes emotion from the text it self?

➕ show 2 replies

butz • 01/16/2026

How large is the model and is it possible to train it read other languages, not only English?

➕ show 1 reply

aki237 • 01/16/2026

This is impressive.

I just tried some sample verses, sounds natural.

But there seems to be a bug maybe? Just for fun, I had asked it to play the Real Slim Shady lyrics. It always seems to add 1 extra "please stand-up" in the chorus. Anyone see that?

➕ show 1 reply

aidenn0 • 01/16/2026

I'm sure I'm being stupid, but every voice except "alba" I recognize from Les Miserables; is there a character I'm forgetting?

➕ show 1 reply

kreelman • 01/17/2026

Had so much fun with this. Was able to get my favourite celebrities to warn me about things happening on this PC.

indigodaddy • 01/16/2026

Perfect timing that is exactly what I am looking for for a fun little thing I'm working on. The voices sound good!

g947o • 01/16/2026

I wonder if this could be adapted into an app that can run completely offline?

➕ show 1 reply

bboplifa • 01/19/2026

it is similar to chatterbox as far as realism at half the speed and no gpu needed which leads me to wonder, why is chatterbox so slow ?

anonymous344 • 01/16/2026

doesn't seem to know thai language. anyobody can suggest thai tts?

maxglute • 01/16/2026

Would be nice if preview supports variable speed.

grahamrr • 01/16/2026

voices sound great! i see sample rate can be adjusted, is there any way to adjust the actual speed of the voice?

➕ show 1 reply

fuzzer371 • 01/16/2026

Haven't we had TTS for like 20+ years? Why does AI need to be shoved into it all of a sudden. Total waste of electricity.

➕ show 2 replies

oybng • 01/16/2026

>If you want access to the model with voice cloning, go to https://huggingface.co/kyutai/pocket-tts and accept the terms, then make sure you're logged in locally with `uvx hf auth login` lol

➕ show 1 reply

tempaccountabcd • 01/16/2026

[dead]

alt Hacker News

Pocket TTS: A high quality TTS that gives your CPU a voice

Comments