logoalt Hacker News

Perceptually lossless (talking head) video compression at 22kbit/s

211 pointsby skandium11/08/2024133 commentsview on HN

Comments

gwd11/08/2024

This reminds me of a scene in "A Fire Upon the Deep" (1992) where they're on a video call with someone on another spaceship; but something seems a bit "off". Then someone notices that the actual bitrate they're getting from the other vessel is tiny -- far lower than they should be getting given the conditions -- and so most of what they're seeing on their own screens isn't actual video feed, but their local computer's reconstruction.

show 4 replies
zbobet201211/08/2024

These sorts of models pop here quite a bit, and they ignore fundamental facts of video codecs (video specific lossy compression technologies).

Traditional codecs have always focused on trade offs among encode complexity, decode complexity, and latency. Where complexity = compute. If every target device ran a 4090 at full power, we could go far below 22kbps with a traditional codec techniques for content like this. 22kbps isn't particularly impressive given these compute constraints.

This is my field, and trust me we (MPEG committees, AOM) look at "AI" based models, including GANs constantly. They don't yet look promising compared to traditional methods.

Oh and benchmarking against a video compression standard that's over twenty years old isn't doing a lot either for the plausibility of these methods.

show 3 replies
LeoPanthera11/08/2024

This is very impressive, but “perceptually lossless” isn’t a thing and doesn’t make sense. It means “lossy”.

show 11 replies
Vecr11/08/2024

Fire Upon the Deep had more or less this. Story important, so I won't say more. That series in general had absolutely brutal bandwidth limitations.

MayeulC11/08/2024

I like how the saddle in the background moves with the reconstructed head; it probably works better with uncluttered backgrounds.

This is interesting tech, and the considerations in the introduction are particularly noteworthy. I never considered the possibility of animating 2D avatars with no 3D pipeline at all.

stuaxo11/09/2024

Bit off putting that it's Musk for some reason, maybe it's just overexposure to his bullshit, I could quite happily never see him again.

Maybe there is a custom web filter in there somewhere that could block particular people and images of them.

show 1 reply
red0point11/08/2024

> But one overlooked use case of the technology is (talking head) video compression.

> On a spectrum of model architectures, it achieves higher compression efficiency at the cost of model complexity. Indeed, the full LivePortrait model has 130m parameters compared to DCVC’s 20 million. While that’s tiny compared to LLMs, it currently requires an Nvidia RTX 4090 to run it in real time (in addition to parameters, a large culprit is using expensive warping operations). That means deploying to edge runtimes such as Apple Neural Engine is still quite a ways ahead.

It’s very cool that this is possible, but the compression use case is indeed .. a bit far fetched. A insanely large model requiring the most expensive consumer GPU to run on both ends and at the same time being limited in bandwidth so much (22kbps) is a _very_ limited scenario.

show 5 replies
hinkley11/08/2024

The second example shown is not perceptually lossless, unless you’re so far on the spectrum you won’t make eye contact even with a picture of a person. The reconstructed head doesn’t look in the same direction as the original.

However is does raise an interesting property in that if you are on the spectrum or have ADHD, you only need one headshot of yourself staring directly at the camera and then the capture software can stop you from looking at your taskbar or off into space.

show 1 reply
AndrewVos11/08/2024

Elon weirdly looks more human than usual in the AI version!

initramfs11/08/2024

nice feature for low bandwidth 4G cell systems.

Reminds me of the video chat in Metal Gear Solid 1 https://youtu.be/59ialBNj4lE?t=21

show 2 replies
pastelsky11/08/2024

Did not expect to see Emraan Hashmi in this post!

show 1 reply
antiquark11/08/2024

Not quite lossless... look at the bicycle seat behind him. When he tilts his head, the seat moves with his hair.

show 2 replies
JimDabell11/08/2024

I got some interesting replies when I suggested this technique here:

https://news.ycombinator.com/item?id=22907718

userbinator11/09/2024

the only information that needs to be transmitted is the change in expression, pose and facial keypoints

Does anyone else remember the weirder (for lack of a better term) features of MPEG-4 part 2, like face and body animation? It did something like that, but as far as I know nearly no one used that feature for anything.

https://en.wikipedia.org/wiki/Face_Animation_Parameter

and in the worst, trust on the internet will be heavily undermined

...as long as the model doesn't include data to put a shoe on one's head.

tommiegannert11/08/2024

Now that we're moving towards context-specific compression algorithms, can we please use WASM as the file header for these media files, instead of inventing something new. :)

vtodekl11/08/2024

[dead]

andrewstuart11/08/2024

The more magic AI makes, the less magical the world becomes.

show 7 replies
up2isomorphism11/08/2024

“Perceptually lossless” is an oxymoron.

show 4 replies