Show HN: Ghost Pepper – Local hold-to-talk speech-to-text for macOS

204 points • by MattHart88 • yesterday at 7:50 PM • 94 comments • view on HN

I built this because I wanted to see how far I could get with a voice-to-text app that used 100% local models so no data left my computer. I've been using a ton for coding and emails. Experimenting with using it as a voice interface for my other agents too. 100% open-source MIT license, would love feedback, PRs, and ideas on where to take it.

Comments

atlgator • yesterday at 10:06 PM

This thread is a support group for people who have each independently built the same macOS speech-to-text app.

➕ show 2 replies

arkensaw • yesterday at 11:22 PM

This is great, and I'm not knocking it, but every time I see these apps it reminds me of my phone.

My 2021 Google Pixel 6, when offline, can transcribe speech to text, and also corrects things contextually. it can make a mistake, and as I continue to speak, it will go back and correct something earlier in the sentence. What tech does Google have shoved in there that predates Whisper and Qwen by five years? And why do we now need a 1Gb of transformers to do it on a more powerful platform?

➕ show 3 replies

cupcake-unicorn • today at 12:09 AM

https://handy.computer/ already exists?

➕ show 4 replies

primaprashant • yesterday at 9:53 PM

Speech-to-text has become integral part of my dev flow especially for dictating detailed prompts to LLMs and coding agents.

I have collected the best open-source voice typing tools categorized by platform in this awesome-style GitHub repo. Hope you all find this useful!

https://github.com/primaprashant/awesome-voice-typing

goodroot • yesterday at 8:08 PM

Nice one! For Linux folks, I developed https://github.com/goodroot/hyprwhspr.

On Linux, there's access to the latest Cohere Transcribe model and it works very, very well. Requires a GPU though. Larger local models generally shouldn't require a subordinate model for clean up.

Have you compared WhisperKit to faster-whisper or similar? You might be able to run turbov3 successfully and negate the need for cleanup.

Incidentally, waiting for Apple to blow this all up with native STT any day now. :)

➕ show 3 replies

charlietran • yesterday at 8:00 PM

Thank you for sharing, I appreciate the emphasis on local speed and privacy. As a current user of Hex (https://github.com/kitlangton/Hex), which has similar goals, what are your thoughts on how they compare?

parhamn • yesterday at 9:03 PM

I see a lot of whisper stuff out there. Are these the same old OpenAI whispers or have they been updated heavily?

I've been using parakeet v3 which is fantastic (and tiny). Confused why we're still seeing whisper out there, there's been a lot of development.

➕ show 3 replies

raybb • yesterday at 11:37 PM

Would also like to know how it compares to https://github.com/openwhispr/openwhispr

I like that openwhisper lets me do on device and set a remote provider.

atlasagentsuite • yesterday at 11:53 PM

Genuinely curious — what's your approach to memory and context management? That's where most agent frameworks hit walls in production. We run a multi-agent orchestration system and the biggest unlock was treating each agent's context as a finite resource rather than unbounded context windows. Worth thinking about before you scale too far.

ericmcer • yesterday at 9:17 PM

I see quite a few of these, the killer feature to me will be one that fine tunes the model based on your own voice.

E.G. if your name is `Donold` (pronounced like Donald) there is not a transcription model in existence that will transcribe your name correctly. That means forget inputting your name or email ever, it will never output it correctly.

Combine that with any subtleties of speech you have, or industry jargon you frequently use and you will have a much more useful tool.

We have a ton of options for "predict the most common word that matches this audio data" but I haven't found any "predict MY most common word" setups.

➕ show 3 replies

konaraddi • yesterday at 8:13 PM

That’s awesome! Do you know how it compares to Handy? Handy is open source and local only too. It’s been around a while and what I’ve been using.

https://github.com/cjpais/handy

➕ show 6 replies

pmarreck • today at 12:05 AM

How does this compare with Superwhisper, which is otherwise excellent but not cheap?

ipsum2 • yesterday at 8:06 PM

Parakeet is significantly more accurate and faster than Whisper if it supports your language.

➕ show 4 replies

__mharrison__ • yesterday at 9:34 PM

Cool, I've been doing a lot of "coding" (and other typing tasks) recently by tapping a button on my Stream Deck. It starts recording me until I tap it again. At which point, it transcribes the recording and plops it into the paste buffer.

The button next to it pastes when I press it. If I press it again, it hits the enter command.

You can get a lot done with two buttons.

rcarmo • yesterday at 9:53 PM

Not sure why I should use this instead of the baked-in OS dictation features (which I use almost daily--just double-tap the world key, and you're there). What's the advantage?

➕ show 1 reply

mathis • yesterday at 8:40 PM

If you don't feel like downloading a large model, you can also use `yap dictate`. Yap leverages the built-in models exposed though Speech.framework on macOS 26 (Tahoe).

Project repo: https://github.com/finnvoor/yap

hyperhello • yesterday at 8:55 PM

Feature request or beg: let me play a speech video and transcribe it for me.

➕ show 1 reply

thatxliner • yesterday at 11:56 PM

why isn't the cleanup done on the transcription (as opposed to screen record)

janalsncm • yesterday at 10:24 PM

I think the jab at the bottom of the readme is referring to whispr flow?

https://wisprflow.ai/new-funding

tito • yesterday at 10:16 PM

This is great. I'm typing this message now using Ghost Pepper. What benefits have you seen from the OCR screen sharing step?

Supercompressor • yesterday at 9:16 PM

I've been looking for the opposite - wanting to dump text and it be read to me, coherently. Anyone have good recommendations?

➕ show 1 reply

guzik • yesterday at 9:00 PM

Sadly the app doesn't work. There is no popup asking for microphone permission.

EDIT: I see there is an open issue for that on github

➕ show 1 reply

dakila5 • yesterday at 10:58 PM

MacWhisper is also a good one

purplehat_ • yesterday at 9:35 PM

Hi Matt, there's lots of speech-to-text programs out there with varying levels of quality. 100% local is admirable but it's always a tradeoff and users have to decide for themselves what's worth it.

Would you consider making available a video showing someone using the app?

➕ show 1 reply

gegtik • yesterday at 9:15 PM

how does this compare to macos built in siri TTS, in quality and in privacy?

➕ show 1 reply

aristech • yesterday at 8:57 PM

Great job. How about the supported languages? System languages gets recognised?

➕ show 1 reply

douglaswlance • yesterday at 9:49 PM

does it input the text as soon as it hears it? or does it wait until the end?

Ecko123 • yesterday at 10:07 PM

[dead]