> The GPU and CPU share memory, that doesn't mean you don't need to interact with the G...

roenxi • today at 4:34 AM • 2 replies • view on HN

> The GPU and CPU share memory, that doesn't mean you don't need to interact with the GPU, anymore.

But we already have software that talks to the GPU; mesa3d and the ecosystem around that. It has existed for decades. My understanding was that the main reasons not to use it was that memory management was too complicated and CUDA solved that problem.

If memory gets unified, what is the value proposition of ROCm supposed to be over mesa3d? Why does AMD need to invent some new way to communicate with GPUs? Why would it be faster?

Replies

SwellJoe • today at 4:48 AM

> CUDA solved that problem.

CUDA is a proprietary Nvidia product. CUDA solved the problem for Nvidia chips.

On AMD GPUs, you use ROCm. On Intel, you use OpenVINO. On Apple silicon you use MLX. All work fine with all the common AI tasks you'd want to do on self-hosted hardware. CUDA was there first and so it has a more mature ecosystem, but, so far, I've found 0 models or tasks I haven't been able to use with ROCm. llama.cpp works fine. ComfyUI works fine. Transformers library works fine. LM Studio works fine.

Unless you believe Nvidia having a monopoly on inference or training AI models is good for the world, you can't oppose all the other GPU makers having a way for their chips to be used for those purposes. CUDA is a proprietary vendor-specific solution.

Edit: But, also, Vulkan works fine on the Strix Halo. It is reliable and usually not that much slower than ROCm (and occasionally faster, somehow). Here's some benchmarks: https://kyuz0.github.io/amd-strix-halo-toolboxes/

➕ show 2 replies

dragontamer • today at 5:31 AM

> If memory gets unified, what is the value proposition of ROCm supposed to be over mesa3d? Why does AMD need to invent some new way to communicate with GPUs? Why would it be faster?

And the memory barriers? How do you sync up the L1/L2 cache of a CPU core with the GPU's cache?

Exactly. With a ROCm memory barrier, ensuring parallelism between CPU + GPU, while also providing a mechanism for synchronization.

GPU and CPU can share memory, but they do not share caches. You need programming effort to make ANY of this work.

alt Hacker News

Replies