Ah come on, lazy? As long as it works with the runtime you wanna use, instead of hardcoding their own solution, should work fine. If you want to use Candle and have to implement new architectures with it to be able to use it, you still can, just expose it over HTTP.
I think one of the major problems with the current incarnation of AI solutions is that they're extremely brittle and hacked-together. It's a fun exciting time, especially for us technical people, but normies just want stuff to "work."
Even copy-pasting an API key is probably too much of a hurdle for regular folks, let alone running a local ollama server in a Docker container.