> But one overlooked use case of the technology is (talking head) video compression.
> On a spectrum of model architectures, it achieves higher compression efficiency at the cost of model complexity. Indeed, the full LivePortrait model has 130m parameters compared to DCVC’s 20 million. While that’s tiny compared to LLMs, it currently requires an Nvidia RTX 4090 to run it in real time (in addition to parameters, a large culprit is using expensive warping operations). That means deploying to edge runtimes such as Apple Neural Engine is still quite a ways ahead.
It’s very cool that this is possible, but the compression use case is indeed .. a bit far fetched. A insanely large model requiring the most expensive consumer GPU to run on both ends and at the same time being limited in bandwidth so much (22kbps) is a _very_ limited scenario.
Staying in contact with someone for hours on metered mobile internet connection comes to mind. Low bandwidth translates to low total data volume over time. If I could be video chatting on one of those free internet SIM cards that's a breakthrough.
One use case might be if you have limited bandwidth, perhaps only a voice call, and want to join a video conference. I could imagine dialling in to a conference with a virtual face as an improvement over no video at all.
130m parameters isn’t insanely large, even for smartphone memory. The high GPU usage is a barrier at the moment, but I wouldn’t put it past Apple to have 4090-level GPU performance in an iPhone before 2030.
The trade-off may not be worth it today, but the processing power we can expect in the coming years will make this accessible to ordinary consumers. When your laptop or phone or AR headset has the processing power to run these models, it will make more efficient use of limited bandwidth, even if more bandwidth is available. I don't think available bandwidth will scale at the same rate as processing power, but even if it does, the picture be that much more realistic.
One cool use would be communication in space - where it's feasible that both sides would have access to high-end compute units but have a very limited bandwidth between each other.