Did you try it with a lower frame rate on the video?
It seems like that'd be a good way to reduce the compute cost, and if I know I'm talking to a robot then I don't think I'd mind if the video feed had a sort of old-film vibe to it.
Plus it would give you a chance to introduce fun glitch effects (you obviously are into visuals) and if you do the same with the audio (but not sacrificing actual quality) then you could perhaps manage expectations a bit, so when you do go over capacity and have to slow down a bit, people are already used to the "fun glitchy Max Headroom" vibe.
Just a thought. I'll check out the video chat as soon as my allegedly human Zoom call ends. :-)
Now that I tried it out, I find it very Westworld and I think I would prefer something more plastic, more witty in the way the web site and the launch process is witty. Robot Twin Hassaan was a bit creepy in his Uncanny Valley Ranch.
Up to you, obviously, but I think you might get further being less creepy while you deal with the technical challenges, and then unveil your James Delos[0] to the investors when he's more ready.
[0]: https://www.youtube.com/watch?v=EJGgnxTMVd4