Ollama (also wrapping llama.cpp) has GPU support, unless you're really in love with the idea of bundling weights into the inference executable probably a better choice for most people.
When I said
> such great performance that I've mostly given up on GPU for LLMs
I mean I used to run ollama on GPU, but llamafile was approximately the same performance on just CPU so I switched. Now that might just be because my GPU is weak by current standards, but that is in fact the comparison I was making.
Edit: Though to be clear, ollama would easily be my second pick; it also has minimal dependencies and is super easy to run locally.
Ollama is great if you're really in love with the idea of having your multi gigabyte models (likely the majority of your disk space) stored in obfuscated UUID filenames. Ollama also still hasn't addressed the license violations I reported to them back in March. https://github.com/ollama/ollama/issues/3185