ROCm is a mistake. It's fundamentally broken by compiling to hardware specific code instead of ...

sorenjan • 01/21/2025 • 3 replies • view on HN

ROCm is a mistake. It's fundamentally broken by compiling to hardware specific code instead of CUDA's RTX, so it will always be plagued with this issue of not supporting all cards, and even if a certain GPU is supported today they can stop supporting it next version. It has happened, it will continue happen.

It's also a strange value proposition. If I'm a programmer in some super computer facility and my boss has bought a new CDNA based computer, fine, I'll write AMD specific code for it. Otherwise why should I? If I want to write proprietary GPU code I'll probably use the de facto industry standard from the industry giant and pick CUDA.

AMD could be collaborating with Intel and a myriad of other companies and organizations and focus on a good open cross platform GPU programming platform. I don't want to have to think about who makes my GPU! I recently switched from an Intel CPU to an AMD, obviously to problem. If I had to get new software written for AMD processors I would have just bought a new Intel, even though AMD are leading in performance at the moment. Even Windows on ARM seems to work ok, because most things aren't written in x86 assembly anymore.

Get behind SYCL, stop with the platform specific compilation nonsense, and start supporting consumer GPUs on Windows. If you provide a good base the rest of the software community will build on top. This should have been done ten years ago.

Replies

frognumber • 01/21/2025

Agreed.

Honestly, the problem isn't just which devices, but even more so, this (from the page, not your comment):

> No guarantees of future support but we will try hard to add support.

During the Great GPU Shortage, I bought an AMD RX5xx card for ML work. It was explicitly advertised to work with ROCm. Within a couple of months, AMD dropped ROCm support. EOLing an actively-sold product from being used for an advertised purpose within the warranty period was, if I understand consumer protection laws in my state correctly, fraud. There was no support from either the card vendor (MSI). No support from AMD. No support from the reseller. Short of small claims, which was not worth it, there was no recourse.

This is on a long list of issues AMD needs to sort out to be a credible player in this space:

* Those are the kinds of experiences which cause people to drop a vendor and not look back. AMD needs to either support cards forever, or at the very least, have an advertised expiration date (like Chromebooks and Android phones).

* Broad support is helpful from a consumer perspective from the simply pragmatic point of view that only a tiny fraction of the population has the time to read online forums, footnotes, or fine print. People should be able to buy a card on Amazon, at Best Buy, and Microcenter, and expect things to Just Work.

* Being able to plan is essential for enterprise use. I can't build a system around AMD if AMD might stop supporting their platform on 0 days notice, and the next day, there might be a security exploit which requires a version bump.

I'm hoping Intel gets their act together here, since NVidia needs a credible competitor. I've given up on AMD.

magic_at_nodai • 01/21/2025

PTX does provide a low level machine abstraction. However you still target some version of hardware ( https://arnon.dk/matching-sm-architectures-arch-and-gencode-... ). However a lot of software effort has gone into it to make it look and work seamlessly.

Though AMD doesn't have the same "virtual ISA" as PTX right now there are increasing levels of such abstraction available in compiled flows with MLIR / Linalg etc. Those are higher level and can be compiled / jitted in realtime to obviate the need for a low level virtual ISA.

danjl • 01/21/2025

We already fought and lost this battle with 3D APIs for GPUs. What makes you think that winning strategy would play out any other way for tensor processing?

alt Hacker News

Replies