logoalt Hacker News

pdhborgeslast Friday at 9:14 PM1 replyview on HN

Where is that improvement coming from? Hardware is already here to compute gemm as fast as possible.


Replies

leakyfilterlast Friday at 9:17 PM

Raw gemm computation was never the real bottleneck, especially on the newer GPUs. Feeding the matmuls i.e memory bandwidth is where it’s at, especially in the newer GPUs.