As someone from the rendering side of GPU stuff, what exactly is the point of ROCm/CUDA? We already have Vulkan and SPIR-V with vendor extensions as a mostly-portable GPU API, what do these APIs do differently?
Furthermore, don't people use PyTorch (and other libraries? I'm not really clear on what ML tooling is like, it feels like there's hundreds of frameworks and I haven't seen any simplified list explaining the differences. I would love a TLDR for this) and not ROCm/CUDA directly anyways? So the main draw can't be ergonomics, at least.
Vulkan doesn't do C++ as shading language for example, there are some backend attempts to target SPIR-V, but it still is early days and nowhere close of having the IDE integration, graphical debugging tools and rendering libraries that CUDA enjoys.
Examples of rendering solutions using CUDA,
https://www.nvidia.com/en-us/design-visualization/solutions/...
https://home.otoy.com/render/octane-render/
It is definitely ergonomics and tooling.
Cuda the language is an antique dialect of C++ with a vectorisation hack. It's essentially what you get if you take an auto-vectoriser and turn off the correctness precondition, defining the correct semantics to be that which you get if you ignore dataflow. This was considered easier to program with than vector types and intrinsics.
Cuda the ecosystem is a massive pile of libraries for lots of different domains written to make it easier to use GPUs to do useful work. This is perhaps something of a judgement on how easy it is to write efficient programs using cuda.
ROCm contains a language called HIP which behaves pretty similarly to Cuda. OpenCL is the same sort of thing as well. It also contains a lot of library code, in this case because people using Cuda use those libraries and don't want to reimplement them. That's a bit of a challenge because nvidia spent 20 years writing these libraries and is still writing more, yet amd is expected to produce the same set in an order of magnitude less time.
If you want to use a GPU to do maths, you don't actually need any of this stuff. You need the GPU, something to feed it data (e.g. a linux host) and some assembly. Or LLVM IR / freestanding c++ if you prefer. This whole cuda / rocm thing really is intended to make them easier to program.
users mainly use PyTorch and Jax and these days rarely write CUDA code.
however separately, installing drivers and the correct CUDA/CuDNN libraries is the responsibility of the user. this is sometimes slightly finicky.
with ROCm, the problem is that 1) PyTorch/Jax don't support it very well, for whatever reason which may be partly to do with the quality of ROCm frustrating PyTorch/Jax devs, 2) installing drivers and libraries is a nightmare. it's all poorly documented and constantly broken. 3) hardware support is very spotty and confusing.