When I started prototyping it's started as full duplex. I was up for the challenge and wanted to understand why this already didn't really exist.
It ended up being awful. A standard real time call your able to interject and talk over someone and this works because of the low latency. The other person can stop talking and the conversation still flows.
The latency from Tor just makes it awkward to the point that you have to almost relearn how to have a conversation since by the time your interjecting and they hear the interjection, a whole 6 seconds may have passed and they may already be on a whole other train of thought. Walkie talkie architecture just forces you to listen and digest the message, think, respond.
There are two layers of encryption here. Tor, which already is encrypting the data, and locally via openssl before transmit.
I have 21 ciphers programed in from the openSSL library. There's a lot more available in the library but these are supposedly the strongest. The cipher used is not secret so while in the call you can see your cipher, and the remote cipher real time.
The authentication is resting entirely in the users lap. It's up to them how to come up with key exchange.
For me, I would be comfortable enough knowing the other side is who I think it is by the simple fact that audio is passing through. Establishing a connection once in person is most ideal.
Because the .onion is derived on device, an attacker can't just forge that. You need also need the private key for a connection to be established.
Let's say an attacker copies the directory from one of the endpoints and has the private key now and can launch tor under your static address. Well, because the additional shared secret is encrypted with another password known only to the user it's useless. The other side will not be receiving forged audio because of this barrier. They may get as far as being in a call, but no audio is going to be played back because the shared secret was successful protected. Even if you pass your message to the attacker, decryption will fail and nothing will pass.
Directly after sending, we run rm -f on $raw_file, $opus_file, $enc_file.
Audio is recived to /audio folder. After audio is decrypted and played we run rm -f $enc_file, $dec_file. There is only a split second that it lives on disk until it's gone.