Ah true, missed that! Still a bit cumbersome & lazy imo, I'm a fan of just shipping with that capability out-of-the-box (Huggingface's Candle is fantastic for downloading/syncing/running models locally).
Ah come on, lazy? As long as it works with the runtime you wanna use, instead of hardcoding their own solution, should work fine. If you want to use Candle and have to implement new architectures with it to be able to use it, you still can, just expose it over HTTP.
In local setup you still usually want to split machine that runs inference from client that uses it, there are often non trivial resources used like chromium, compilation, databases etc involved that you don’t want to pollute inference machine with.