Taking on CUDA with ROCm: 'One Step After Another'

131 points • by mindcrime • yesterday at 10:38 PM • 95 comments • view on HN

Comments

Just spent the last week or so porting TheRock to stagex in an effort to get ROCm built with a native musl/mimalloc toolchain and get it deterministic for high security/privacy workloads that cannot trust binaries only built with a single compiler.

It has been a bit of a nightmare and had to package like 30+ deps and their heavily customized LLVM, but got the runtime to build this morning finally.

Things are looking bright for high security workloads on AMD hardware due to them working fully in the open however much of a mess it may be.

➕ show 4 replies

0xbadcafebee • today at 1:57 AM

AMD has years of catching up to do with ROCm just to get their devices to work well. They don't support all their own graphics cards that can do AI, and when it is supported, it's buggy. The AMDGPU graphics driver for Linux has had continued instability since 6.6. I don't understand why they can't hire better software engineers.

➕ show 3 replies

rdevilla • today at 1:27 AM

ROCm is not supported on some very common consumer GPUs, e.g. the RX 580. Vulkan backends work just fine.

➕ show 5 replies

suprjami • today at 4:35 AM

Just in time for Vulkan tg to be faster in almost all situations, and Vulkan pp to be faster in many situations with constant improvements on the way, making ROCm obsolete for inference.

➕ show 1 reply

jmward01 • today at 4:07 AM

I really want to get to the point that I am looking online for a GPU and Nvidia isn't the requirement. I think we are really close to there. Maybe we are there and my level of trust just needs to bump up.

pjmlp • today at 5:11 AM

They need lots of steps, hardware support, IDE and graphical debugging integrations , the polyglot ecosystem, having a common bytecode used by several compiler backends (CUDA is not only C++), the libraries portfolio.

p1esk • today at 12:51 AM

Someone from AMD posted this a few minutes ago, then deleted it:

"Anush's success is due to opting out of internal bureaucracy than anything else. most Claude use at AMD goes through internal infrastructure that can take hundreds of seconds per response due to throttling. Anush got us an exemption to use Anthropic directly. he is also exempt from normal policies on open source and so I can directly contribute to projects to add AMD support. He's an effective leader and has turned ROCm into a internal startup based in California. Definitely worth joining the team even if you've heard bad things about AMD as a whole."

This kind of bullshit is why I don't want to join AMD, even if this particular team is temporarily exempt from it.

➕ show 3 replies

roenxi • today at 1:02 AM

> Challenger AMD’s ability to take data center GPU share from market leader Nvidia will certainly depend on the success or failure of its AI software stack, ROCm.

I don't think this is true. ROCm is a huge advantage for Nvidia but as far as I can tell it is more a set of R&D libraries than anything else, so all the Hot New Stuff keeps being Nvidia first and only (to start with) as the library ecosystem for the hotness doesn't exist yet. Then eventually new libraries are created that are CUDA independent and AMD turns out to make pretty good graphics cards.

I wouldn't be surprised of ROCm withered on the vine and AMD still does fine.

hurricanepootis • today at 1:02 AM

I've been using ROCm on my Radeon RX 6800 and my Ryzen AI 7 350 systems. I've only used it for GPU-accelerated rendering in Cycles, but I am glad that AMD has an option that isn't OpenCL now.

bruce343434 • today at 1:33 AM

In my experience fiddling with compute shaders a long time ago, cuda and rocm and opencv are way too much hassle to set up. Usually it takes a few hours to get the toolkits and SDK up and running that is, if you CAN get it up and running. The dependencies are way too big as well, cuda is 11gb??? Either way, just use Vulkan. Vulkan "just works" and doesn't lock you into Nvidia/amd.

➕ show 1 reply

superkuh • yesterday at 11:32 PM

AMD hasn't signaled in behavior or words that they're going to actually support ROCm on $specificdevice for more than 4-5 years after release. Sometimes it's as little as the high 3.x years for shrinks like the consumer AMD RX 580. And often the ROCm support for consumer devices isn't out until a year after release, further cutting into that window.

Meanwhile nvidia just dropped CUDA/driver support for 1xxx series cards from their most recent drivers this year.

For me ROCm's mayfly lifetime is a dealbreaker.

➕ show 6 replies

alecco • today at 12:05 AM

Apple got it right with unified memory with wide bus. That's why Mac Minis are flying for local models. But they are 10x less powerful in AI TOPS. And you can't upgrade the memory.

I really wish AMD and Intel boards get replaced by competent people. They could do it in very short time. Both have integrated GPUs with main memory. AMD and Intel have (or at least used to have) serious know-how in data buses and interconnects, respectively. But I don't see any of that happening.

ROCm? It can't even support decent Attention. It lacks a lot of features and NVIDIA is adding more each year. Soon they will reach escape velocity and nobody will catch them for a decade. smh

➕ show 2 replies

ycui1986 • today at 1:11 AM

For many LLM load, it seems ROCm is slower than vulkan. What’s the point?

shmerl • yesterday at 11:43 PM

Side question, but why not advance something like Rust GPU instead as a general approach to GPU programming? https://github.com/Rust-GPU/rust-gpu/

From all the existing examples, it really looks the most interesting.

I.e. what I'm surprised about is lack of backing for it from someone like AMD. It doesn't have to immediately replace ROCm, but AMD would benefit from it advancing and replacing the likes of CUDA.

➕ show 3 replies

xkbear89 • today at 4:55 AM

[flagged]

emilyhudson • today at 5:24 AM

[dead]

cameolkc • today at 2:25 AM

[dead]

blovescoffee • yesterday at 11:28 PM

Naive question, could agents help speed up building code for ROCm parity with CUDA? Outside of code, what are the bottlenecks for reaching parity?

➕ show 3 replies

nnevatie • today at 3:11 AM

Why is it called "ROCm” (with the strange capitalization) in the first place? This may sound silly, but in order to compete, every detail matters, including the name.

➕ show 3 replies

alt Hacker News

Taking on CUDA with ROCm: 'One Step After Another'

Comments