I'm running Fedora Silverblue as my host OS, this is the kernel:
$ uname -a
Linux fedora 6.18.9-200.fc43.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Feb 6 21:43:09 UTC 2026 x86_64 GNU/Linux
You also need to set a few kernel command line paramters to set it up to allow it to use most of your memory as graphics memory, I have the following in my kernel command line, those are each 110 GiB expressed in number of pages (I figure leaving 18 GiB or so for CPU memory is probably a good idea):
Then I'm running llama.cpp in the official llama.cpp Docker containers. The Vulkan one works out of the box. I had to build the container myself for ROCm, the llama.cpp container has ROCm 7.0 but I need 7.2 to be compatible with my kernel. I haven't actually compared the speed directly between Vulkan and ROCm yet, I'm pretty much at the point where I've just gotten everything working.
(as mentioned, still just getting this set up so I've been moving around between using `-hf` to pull directly from HuggingFace vs. using `uvx hf download` in advance, sorry that these commands are a bit messy, the problem with using `-hf` in llama.cpp is that you'll sometimes get surprise updates where it has to download many gigabytes before starting up)
I'm running Fedora Silverblue as my host OS, this is the kernel:
You also need to set a few kernel command line paramters to set it up to allow it to use most of your memory as graphics memory, I have the following in my kernel command line, those are each 110 GiB expressed in number of pages (I figure leaving 18 GiB or so for CPU memory is probably a good idea): Then I'm running llama.cpp in the official llama.cpp Docker containers. The Vulkan one works out of the box. I had to build the container myself for ROCm, the llama.cpp container has ROCm 7.0 but I need 7.2 to be compatible with my kernel. I haven't actually compared the speed directly between Vulkan and ROCm yet, I'm pretty much at the point where I've just gotten everything working.In a checkout of the llama.cpp repo:
Then I run the container with something like: Still getting my setup dialed in, but this is working for now.Edit: Oh, yeah, you had asked about Qwen3 Coder Next. That command was:
(as mentioned, still just getting this set up so I've been moving around between using `-hf` to pull directly from HuggingFace vs. using `uvx hf download` in advance, sorry that these commands are a bit messy, the problem with using `-hf` in llama.cpp is that you'll sometimes get surprise updates where it has to download many gigabytes before starting up)