logoalt Hacker News

dvttoday at 3:21 AM2 repliesview on HN

Ah true, missed that! Still a bit cumbersome & lazy imo, I'm a fan of just shipping with that capability out-of-the-box (Huggingface's Candle is fantastic for downloading/syncing/running models locally).


Replies

mirekrusintoday at 6:42 AM

In local setup you still usually want to split machine that runs inference from client that uses it, there are often non trivial resources used like chromium, compilation, databases etc involved that you don’t want to pollute inference machine with.

embedding-shapetoday at 3:30 AM

Ah come on, lazy? As long as it works with the runtime you wanna use, instead of hardcoding their own solution, should work fine. If you want to use Candle and have to implement new architectures with it to be able to use it, you still can, just expose it over HTTP.

show 1 reply