> But moving from CUDA to ROCm is often more like a rewrite than a recompile.
Isn't everyone* in this segment just using PyTorch for training, or wrappers like Ollama/vllm/llama.cpp for inference? None have a strict dependency on Cuda. PyTorch's AMD backend is solid (for supported platforms, and Strix Halo is supported).
* enthusiasts whose budget is in the $5k range. If you're vendor-locked to CUDA, Mac Mini and Strix Halo are immediately ruled out.