When I said
> such great performance that I've mostly given up on GPU for LLMs
I mean I used to run ollama on GPU, but llamafile was approximately the same performance on just CPU so I switched. Now that might just be because my GPU is weak by current standards, but that is in fact the comparison I was making.
Edit: Though to be clear, ollama would easily be my second pick; it also has minimal dependencies and is super easy to run locally.