logoalt Hacker News

ashleynlast Saturday at 9:21 PM9 repliesview on HN

How is AMD GPU compatibility with leading generative AI workflows? I'm under the impression everything is CUDA.


Replies

ftvkyolast Sunday at 10:57 AM

There is a project called SCALE that allows building CUDA code natively for AMD GPUs. It is designed as a drop-in replacement for Nvidia CUDA, and it is free for personal and educational use.

You can find out more here: https://docs.scale-lang.com/stable/

There are still many things that need implementing, most important ones being cuDNN and CUDA Graph API, but in my opinion, the list of things that are supported now is already quite impressive (and keeps improving): https://github.com/spectral-compute/scale-validation/tree/ma...

Disclaimer: I am one of the developers of SCALE.

Aeolunlast Sunday at 10:55 AM

All of Ollama and Stable Diffusion based stuff now works on my AMD cards. Maybe it’s different if you want to actually train things, but I have no issues running anything that fits in memory any more.

pjalast Saturday at 9:34 PM

llama.cpp combined with Mesa’s Vulkan support for AMD GPUs has worked pretty well with everything I’ve thrown it at.

show 1 reply
nh43215rgblast Sunday at 10:38 AM

In practical generative AI workflows (LLMs), I think AMD Max+395 chips with unified memory is as good as Mac Studio or MacBook Pro configurations in handling big models locally and support fast inference speeds (However Top-end Apple silicon (M4 Max, Studio Ultra) can reach 546GB/s memory bandwidth, while the AMD unified memory system is around 256GB/s). I think for inference use either will work fine. For everything else I think CUDA ecosystem is a better bet (correct me if I'm wrong).

sbinneelast Sunday at 10:46 AM

My impression is the same. To train anything you just need to have CUDA gpus. For inference I think AMD and Apple M chips are getting better and better.

show 1 reply
DiabloD3last Saturday at 11:45 PM

CUDA isn't really used for new code. Its used for legacy codebases.

In the LLM world, you really only see CUDA being used with Triton and/or PyTorch consumers that haven't moved onto better pastures (mainly because they only know Python and aren't actually programmers).

That said, AMD can run most CUDA code through ROCm, and AMD officially supports Triton and PyTorch, so even the academics have a way out of Nvidia hell.

show 5 replies
dismalaflast Sunday at 5:18 PM

> I'm under the impression everything is CUDA

A very quick Google search would show that pretty much everything also runs on ROCm.

Torch runs on CUDA and ROCm. Llama.cpp runs on CUDA, ROCm, SYCL, Vulkan and others...

trenchpilgrimlast Saturday at 9:24 PM

Certain chips can work with useful local models, but compatibility is far behind CUDA.

wolfgangKlast Saturday at 9:24 PM

Indeed, recent Flash Attention is a pain point for non CUDA.