There also exists cuda-gdb[1], a first-party GDB for NVIDIA's CUDA. I've found it to be pretty good. Since CUDA uses a threading model, it works well with the GDB thread ergonomics (though you can only single-step at the warp granularity IIRC by the nature of SM execution).