To expound on the other questions.
To receive a call, you either need to be online and actively listening for calls, or optionally, you can enable auto listening. When another user calls you it will automatically put you in the call. On end call you will be put back in listening mode. I'm not really sure a great way to get around this without overly complicating it.
I believe because of the small overhead that's added there is just no reason not to layer encryption. At the end of the day I just wanted to see the bits I'm sending over the wire with my own eyes for assurance it's protected regardless of the fact that tor is protecting the data.
The streaming would be a nice improvement for latency. I would have to look into how this would work for the optional audio processing. Having one set file for transport also simplifys the some of the flow with encryption like salting and optional hmac authentication as these are derived from the sum of the entire file, not a portion of it.
> salting
Do you mean IVs? Can't you (for most algorithms) just use a monotonic counter when streaming blocks?
> optional hmac authentication
Wouldn't that just be done per-chunk instead of per-file?