CUDA-l2: Surpassing cuBLAS performance for matrix multiplication through RL

91 points • by dzign • yesterday at 9:04 PM • 12 comments • view on HN

Comments

They claim the algorithm "discovered" the new techniques, but the methods described in section 5 do not seem all that novel to me. It smells like it could be "laundering" the literature [1] and reshuffling existing techniques. This is not inherently a bad thing, but I would hope that if it is borrowing existing techniques, the appropriate citation would eventually make it into this paper.

[1]: https://www.argmin.net/p/lore-laundering-machines

➕ show 3 replies

alyxya • yesterday at 10:06 PM

The chart confused me because I expected to see performance numbers of CUDA-L2 compared to the others, but instead it shows a chart showing the speedup percentage of CUDA-L2 over the others. In some sense, the bar chart effectively inverts the performance of torch.matmul and cuBLAS with how much percentage it shows. 0% on the bar chart would only mean equal performance.

stonogo • yesterday at 9:33 PM

Am I reading this wrong, or does this only support FP16 inputs, and compares its performance against an FP32 solver?

bgwalter • yesterday at 9:36 PM

[flagged]

➕ show 1 reply

alt Hacker News

CUDA-l2: Surpassing cuBLAS performance for matrix multiplication through RL

Comments