It seems so.
The configuration of the session accepts a parameter (modalities) that could restrict the response only to text. See it in https://platform.openai.com/docs/api-reference/realtime-clie....
correct - you should also be able to save a lot by skipping their built-in VAD and doing turn detection (if you need it) locally to avoid paying for silent inputs.
correct - you should also be able to save a lot by skipping their built-in VAD and doing turn detection (if you need it) locally to avoid paying for silent inputs.