What’s the current state of the art in low power wake word and speech to text? Has anyone written a blog post on this?
I was able to run a speech to text on my old Pixel 4 but it’s a bit flaky (the background process loses the audio device occasionally). I just want to take some wake word and then send everything to remote LLM and then get back text that I do TTS on.
Wake word is not expensive, you can do it on esp32 https://docs.espressif.com/projects/esp-sr/en/latest/esp32s3... (and then send audio to something more beefy as TTS will be marginal at best).
Wake word can be tiny. Like 10k weights and can run on an esp32 or similar with plenty of compute to spare.
TinyML is a book that goes through the process of building a wake word model for such constrained environments.
Maybe not SOTA but the HA Voice Preview Edition [1] in tandem with a Pi 5 or some similar low-power host for the Piper / Whisper pipeline is pretty good. I don't use it but was able to get an Alexa/Google Home-like experience going with minimal effort.
I was only using it for local Home Assistant tasks, didn't try anything further like retrieving sports scores, managing TODO lists, or anything like that.
[1] https://www.home-assistant.io/voice-pe/