GGML still runs on llama.cpp, and that still requires CUDA to be installed, unfortunately. I saw a PR for DirectML, but I'm not really holding my breath.
You don't have to install the whole CUDA. They have a redistributable.
You don't have to install the whole CUDA. They have a redistributable.