These sorts of models pop here quite a bit, and they ignore fundamental facts of video codecs (video specific lossy compression technologies).
Traditional codecs have always focused on trade offs among encode complexity, decode complexity, and latency. Where complexity = compute. If every target device ran a 4090 at full power, we could go far below 22kbps with a traditional codec techniques for content like this. 22kbps isn't particularly impressive given these compute constraints.
This is my field, and trust me we (MPEG committees, AOM) look at "AI" based models, including GANs constantly. They don't yet look promising compared to traditional methods.
Oh and benchmarking against a video compression standard that's over twenty years old isn't doing a lot either for the plausibility of these methods.
Why so sour? This particular article doesn't seem to ignore a lot, it even references the Nvidia work that inspired it, as well as a recent benchmark.
Someone was just having fun here, it's not as if they present it as a general codec.
Really? Would there be a way to replicate this with currently available encoders? I'd like to try it
This is my field as well, although I come from the neural network angle.
Learned video codecs definitely do look promising: Microsoft's DCVC-FM (https://github.com/microsoft/DCVC) beats H.267 in BD-rate. Another benefit of the learned approach is being able to run on soon commodity NPUs, without special hardware accommodation requirements.
In the CLIC challenge, hybrid codecs (traditional + learned components) are so far the best, so that has been a letdown for pure end to end learned codecs, agree. But something like H.267 is currently not cheap to run either.