I thought so but no, iterative small matrix multiplication kernel in tensor cores, or pure (generati...

touisteur • today at 12:11 PM • 0 replies • view on HN

I thought so but no, iterative small matrix multiplication kernel in tensor cores, or pure (generative) compute with ultra-late reduction and ultra-small working memory. nsight-compute says everything is in L1 or small register file, no spilling, and that I am compute bound, good ILP. Can't find a way to get more than 10% for the 300W difference. Thus asking if anyone did better and how and how reliable the HW stays.

alt Hacker News