Raw gemm computation was never the real bottleneck, especially on the newer GPUs. Feeding the matmul...

leakyfilter • last Friday at 9:17 PM • 0 replies • view on HN

Raw gemm computation was never the real bottleneck, especially on the newer GPUs. Feeding the matmuls i.e memory bandwidth is where it’s at, especially in the newer GPUs.

alt Hacker News