What I experienced is that AI is a nightmare on AMD in Linux. There is a myriad of custom things that one needs to do, and even that just breaks after a while. Happened so much on my current setup (6600 XT) that I don't bother with local AI anymore, because the time investment is just not worth it.
It's not that I can't live like this, I still have the same card, but if I were looking to do anything AI locally with a new card, for sure it wouldn't be an AMD one.
I set up a deep learning station probably 5-10 years ago and ran into the exact same issue. After a week of pulling out my hair, I just bought an Nvidia card.
Are you referring to AI training, prediction/inference, or both? Could you give some examples for what had to be done and why? Thanks in advance.
I don't have much experience with ROCm for large trainings, but NVIDIA is still shit with driver+cuda version+other things. The only simplification is due to ubuntu and other distros that already do the heavy lift by installing all required components, without much configuration.