logoalt Hacker News

NiloCKyesterday at 2:28 PM3 repliesview on HN

> Didn't expect to go back to macOS but their basically the only feasible consumer option for running large models locally.

I presume here you are referring to running on the device in your lap.

How about a headless linux inference box in the closet / basement?

Return of the home network!


Replies

Aurornisyesterday at 2:32 PM

Apple devices have high memory bandwidth necessary to run LLMs at reasonable rates.

It’s possible to build a Linux box that does the same but you’ll be spending a lot more to get there. With Apple, a $500 Mac Mini has memory bandwidth that you just can’t get anywhere else for the price.

show 5 replies
jannniiiyesterday at 2:42 PM

Indeed and I got two words for you:

Strix Halo

show 2 replies
mythzyesterday at 2:54 PM

Not feasible for Large models, it takes 2x M3 512GB Ultra's to run the full Kimi K2.5 model at a respectable 24 tok/s. Hopefully the M5 Ultra will can improve on that.