logoalt Hacker News

Building voice agents with Nvidia open models

112 pointsby kwindlalast Wednesday at 4:08 PM12 commentsview on HN

Comments

rickydrollyesterday at 2:42 PM

<pedantic>Voice recognition identifies who you are, speech recognition identifies what you say. </pedantic>

Example:

Voice recognition: arrrrrrgh! (Oh, I know that guy. He always gets irritated when someone uses terms speech and voice recognition wrong)

Speech Recognition: "Why can't you guys keep it straight? It is as simple as knowing the difference between hypothesis and theory."

atonseyesterday at 3:39 PM

Can't wait for this to land in MacWhisper. I like the idea of the streaming dictation especially when dictating long prompts to Claude Code.

ameliuslast Wednesday at 6:11 PM

I've been using festival under Linux.

https://manpages.ubuntu.com/manpages/trusty/man1/festival.1....

But it is quite old now and pre-dates the DL/AI era.

Does anybody know of a good modern replacement that I can "apt install"?

show 1 reply
nowittyusernamelast Wednesday at 10:52 PM

This is perfect for me. I just started working on the voice related stuff for my agent framework and this will be of real use. Thanks.

jjcmlast Wednesday at 10:06 PM

These have gotten good enough to really make command-by-voice interactions pleasant. I'd love to try this with Cursor - just use it fully with voice.

deckar01yesterday at 5:17 AM

It supports Turing T4, but not Ampere…

show 1 reply
jauntywundrkindlast Wednesday at 11:41 PM

There's also the excellent also open source unmute.sh. which alas is also Nvidia only at this point. https://unmute.sh/

show 1 reply