Largely Vulkan. Microsoft internally is a huge consumer of DirectML for specifically the LLM team doing Phi and the Copilot deployment that lives at Azure.
I'm not sure if it's just the implementation, but I tried using llama.cpp on Vulkan and it is much slower than using it on CUDA.
Such a huge consumer that they deprecated it
I'm not sure if it's just the implementation, but I tried using llama.cpp on Vulkan and it is much slower than using it on CUDA.