I wish I had known about Pipecat a lot sooner. I found out about it a few weeks back, and since Gemma 4 launched, I've been building my own entirely local voice assistant using Gemma 4 + Kokoro TTS + Whisper from scratch - https://github.com/pncnmnp/strawberry.
Pipecat's smart turn model is really good for VAD - https://huggingface.co/pipecat-ai/smart-turn-v3
What do you have going on the hardware side? I want to plug this into hass but don’t know what hardware I need for reasonable latency
Yeah Gemma4 was and is great fun to do this with - I too am building pretty much the same as yourself in Go.
https://github.com/zarldev/zarl & https://www.zarl.dev/posts/hal-by-any-other-name