The walkie-talkie model is a smart design choice given Tor's latency profile. Real-time bidirectional audio has pretty unforgiving requirements (~150ms round-trip max before it feels awkward), and Tor typically adds 50-200ms per hop. Going store-and-forward sidesteps the whole problem—you're not fighting the network's characteristics, you're designing around them.
Curious what codec you're using for the audio compression. Opus would be the obvious choice for speech but the tradeoffs change a bit when you're not doing real-time streaming.
I love it for the same reason I love email and text communication. Think about what you want to say before you say it. Exclude the useless tangents: formalities, movie quotes, humble brags, cliches, etc. A few second delay is enough to get even the worst offenders to get to the point.
Yes it's encoding in opus, and optionally you can configure encoding quality from 6kbs to 64kbs.
I was really surprised at the intelligability even at 6kbs.
The caviot is if your on termux we have to use the seperate termux API application to pipe audio to termux, and ffmpeg to convert MP4 to opus. Unfortunately termux cannot activate the mic on its own.