Ah true, missed that! Still a bit cumbersome & lazy imo, I'm a fan of just shipping with th...

dvt • today at 3:21 AM • 2 replies • view on HN

Ah true, missed that! Still a bit cumbersome & lazy imo, I'm a fan of just shipping with that capability out-of-the-box (Huggingface's Candle is fantastic for downloading/syncing/running models locally).

Replies

mirekrusin • today at 6:42 AM

In local setup you still usually want to split machine that runs inference from client that uses it, there are often non trivial resources used like chromium, compilation, databases etc involved that you don’t want to pollute inference machine with.

embedding-shape • today at 3:30 AM

Ah come on, lazy? As long as it works with the runtime you wanna use, instead of hardcoding their own solution, should work fine. If you want to use Candle and have to implement new architectures with it to be able to use it, you still can, just expose it over HTTP.

➕ show 1 reply

alt Hacker News

Replies