What is the use case? Okay, ultra low latency streaming. That is good. But. If you are sending the frames via some protocol over the network, like WebRTC, it will be touching the CPU anyway. Software encoding of 4K h264 is real time on a single thread on 65w, decade old CPUs, with low latency. The CPU encoders are much better quality and more flexible. So it's very difficult to justify the level of complexity needed for hardware video encoding. Absolutely no need for it for TV streaming for example. But people keep being obsessed with it who have no need for it.
IMO vendors should stop reinventing hardware video encoding and instead assign the programmer time to making libwebrtc and libvpx better suit their particular use case.
The article explains it. This is not for streaming over the web, but for editing professional grade video on consumer hardware.
The article explicitly mentions that mainstream codecs like H264 are not the target. This is for very high bitrate high resolution professional codecs.
I'm not entirely sure that this is true.
I haven't actually looked into this but it might not be the realm of possibility. But you are generating a frame on GPU, if you can also encode it there, either with nvenc or vulkan doesn't matter. Then DMA the to the nic while just using the CPU to process the packet headers, assuming that cannot also be handled in the GPU/nic
It will be more energy efficient. And the CPU is free to jit half a gig of javascript in the mean time.
If the frames already live on the GPU, pulling them over PCIe just to feed a CPU encoder is wasted bandwidth and latency.
It’s a leftover mindset from the mid-2000s when GPGPU became possible, and additional performance was “unlocked” from an otherwise under-utilized silicon.
> If you are sending the frames via some protocol over the network, like WebRTC, it will be touching the CPU anyway. Software encoding of 4K h264 is real time on a single thread on 65w, decade old CPUs, with low latency.
This is valid for a single stream, but the equation changes when you're trying to squeeze the highest # of simultaneous streams into the least amount of CapEx possible. Sure, you still have to transfer it to the CPU cache just before you send it over WebRTC/HTTP/whatever, but you unlock a lot of capacity by using all the rest of the silicon as much as you can. Being able to use a budget/midrange GPU instead of a high-end ultra-high-core-count CPU could make a big difference to a business with the right use-case.
That said, TFA doesn't seem to be targeting that kind of high stream density use-case either. I don't think e.g. Frigate NVR users are going to switch to any of the mentioned technologies in this blog post.