Good question! Software gets democratized so fast that I am sure others will implement similar approaches soon. And, to be clear, some of our "speed upgrades" are pieced together from recent DiT papers. I do think getting everything running on a single GPU at this resolution and speed is totally new (as far as i have seen).
I think people will just copy it, and we just need to continue moving as fast as we can. I do think that a bit of a revolution is happening right now in real-time video diffusion models. There are so many great papers being published in that area in the last 6 months. My guess is that many DiT models will be real time within 1 year.
One thing that is interesting: LLMs pipelines have been highly optimize for speed (since speed is directly related to cost for companies). That is just not true for real-time DiTs. So, there is still lots of low hanging fruit for how we (and others) can make things faster and better.
Curious about the memory bandwidth constraints here. 20B parameters at 20fps seems like it would saturate the bandwidth of a single GPU unless you are running int4. I assume this requires an H100?
> I do think getting everything running on a single GPU at this resolution and speed is totally new
Thanks, it seemed to be the case that this was really something new, but HN tends to be circumspect so wanted to check. It's an interesting space and I try to stay current but everything is moving so fast. But I was pretty sure I hadn't seen anyone do that. Its a huge achievement to do it first and make it work for real like this! So well done!