> Didn't expect to go back to macOS but their basically the only feasible consumer option for running large models locally.
I presume here you are referring to running on the device in your lap.
How about a headless linux inference box in the closet / basement?
Return of the home network!
Not feasible for Large models, it takes 2x M3 512GB Ultra's to run the full Kimi K2.5 model at a respectable 24 tok/s. Hopefully the M5 Ultra will can improve on that.
Apple devices have high memory bandwidth necessary to run LLMs at reasonable rates.
It’s possible to build a Linux box that does the same but you’ll be spending a lot more to get there. With Apple, a $500 Mac Mini has memory bandwidth that you just can’t get anywhere else for the price.