This is frustratingly one-sided writing. Yeah, WebRTC has limitations, but relying on a standard buys you a lot of correctness and reduces long-term engineering cost. The fact that WebRTC is complicated does not mean it is wrong; it means real-time media over the public internet is complicated.
Also, networking is inherently stateful. NAT traversal, jitter buffers, congestion control, packet loss, codec state, encryption, and session routing do not disappear because you put audio over TCP or WebSocket. Pretending otherwise is not architectural clarity. It is just moving the complexity somewhere less visible.
QUIC is also a standard.
“How hard can it be?” the strawman asked.
It’s 2026 and teleconferencing is still such a shit show. There’s billions of dollars to be had and Zoom is at best mediocre, and it can be as bad as Microsoft Whatchamacallit. I’ve never not seen teleconferencing be a ham handed mess.
You might have noticed that the author started the blog post explaining themselves:
I think that they've done more than enough of 'trying the normal way' to be warranted in having an opinion the other way, don't you think?