Hey, thanks for shouting us out!
Just to clarify, the audio-to-video part (which is the part we make) adds <300ms. The total end-to-end latency for the interaction is higher, given that state of the art LLMs, TTS and STT models still add quite a bit of latency.
TLDR: Adding Simli to your voice interaction shouldn't add more than ~300ms latency.