Apple devices have high memory bandwidth necessary to run LLMs at reasonable rates.
It’s possible to build a Linux box that does the same but you’ll be spending a lot more to get there. With Apple, a $500 Mac Mini has memory bandwidth that you just can’t get anywhere else for the price.
And then only Apple devices have 512GB of unified memory, which matters when you have to combine larger models (even MoE) with the bigger context/KV caching you need for agentic workflows. You can make do with less, but only by slowing things down a whole lot.
> a $500 Mac Mini has memory bandwidth that you just can’t get anywhere else for the price.
The cheapest new mac mini is $600 on Apple's US store.
And it has a 128-bit memory interface using LPDDR5X/7500, nothing exotic. The laptop I bought last year for <$500 has roughly the same memory speed and new machines are even faster.
Only the M4 Pro Mac Minis have faster RAM than you’ll get in an off-the-shelf Intel/AMD laptop. The M4 Pros start at $1399.
You want the M4 Max (or Ultra) in the Mac Studios to get the real stuff.
But a $500 Mac Mini has nowhere near the memory capacity to run such a model. You'd need at least 2 512GB machines chained together to run this model. Maybe 1 if you quantized the crap out of it.
And Apple completely overcharges for memory, so.
This is a model you use via a cheap API provider like DeepInfra, or get on their coding plan. It's nice that it will be available as open weights, but not practical for mere mortals to run.
But I can see a large corporation that wants to avoid sending code offsite setting up their own private infra to host it.
With Apple devices you get very fast predictions once it gets going but it is inferior to nvidia precisely during prefetch (processing prompt/context) before it really gets going.
For our code assistant use cases the local inference on Macs will tend to favor workflows where there is a lot of generation and little reading and this is the opposite of how many of use use Claude Code.
Source: I started getting Mac Studios with max ram as soon as the first llama model was released.