logoalt Hacker News

storystarlingyesterday at 10:30 PM1 replyview on HN

Curious about the memory bandwidth constraints here. 20B parameters at 20fps seems like it would saturate the bandwidth of a single GPU unless you are running int4. I assume this requires an H100?


Replies

andrew-wyesterday at 10:38 PM

Yep, the model is running on Hopper architecture. Anything less was not sufficient in our experiments.