logoalt Hacker News

hasperditoday at 8:50 AM1 replyview on HN

What you said is possible by feeding the output of speech-to-text tools into an LLM. You can prompt the LLM to make sense of what you're trying to achieve and create sets of actions. With a CLI it’s trivial, you can have your verbal command translated into working shell commands. With a GUI it’s slightly more complicated because the LLM agent needs to know what you see on the screen, etc.

That CLI bit I mentioned earlier is already possible. For instance, on macOS there’s an app called MacWhisper that can send dictation output to an OpenAI‑compatible endpoint.


Replies

sipjcatoday at 8:51 AM

Handy can post process with LLMs too! It’s just currently hidden behind a debug menu as an alpha feature (ctrl/cmd+shift+d)