Regarding the section on Python and high-level CUDA, anyone interested should maybe first take a peek at Warp, which I’m guessing is too new to have a book yet. Warp lets you write CUDA kernels directly in Python, and it’s a breeze to get started. https://github.com/nvidia/warp
"AI Systems Performance Engineering" might deserve a mention, even though it's not strictly CUDA.
I liked going through https://www.olcf.ornl.gov/cuda-training-series/ for an intro and some fundamentals.
First one I clicked on is 404: Programming Massively Parallel Processors: A Hands-on Approach (3rd Edition) https://www.cambridge.org/core/books/programming-in-parallel...
Increasingly (for instance ADSP podcast [1]) those in nvidia's inner circle are advocating against writing your own CUDA kernels. (Unless that's your full time job at nvidia, that is).
Any good MOOCs on Parallel programming/NVIDIA?
In an age when your company mandates you to raise your productivity right now with hundreds of percentage points using LLMs, how do you find an excuse to sit down and read a book?
Having read or at least skimmed most of those books, I think the best intro is 'CUDA Programming: A Developer's Guide to Parallel Computing with GPUs'
Massively Parallel Processors: A Hands-on Approach is not really good in my opinion, many small mistakes and confusing sentences (even when you know cuda).
CUDA by Example: An Introduction to General-Purpose GPU Programming is too simple and abstract too much the architecture.
Next year I'm planning to start writing a cuda book that starts by engineering the hardware, and goes up to the optimization part on that harware (which is basically a nvidia card) including all the main algorithms (except for graphs).
I'm already teaching the course in this way at uni, and it is quite successful among students.