Cuda optimization actually doesn’t suck that much. I think NSight studio is amazing and super helpful for profiling and identifying bottlenecks in kernels
Totally, NSight is great. We do something similar: generate kernels, profile them on real GPUs, then optimize based on that:D
Totally, NSight is great. We do something similar: generate kernels, profile them on real GPUs, then optimize based on that:D