logoalt Hacker News

chatmastatoday at 4:21 AM11 repliesview on HN

I bet there’s gonna be a banger of a Mac Studio announced in June.

Apple really stumbled into making the perfect hardware for home inference machines. Does any hardware company come close to Apple in terms of unified memory and single machines for high throughput inference workloads? Or even any DIY build?

When it comes to the previous “pro workloads,” like video rendering or software compilation, you’ve always been able to build a PC that outperforms any Apple machine at the same price point. But inference is unique because its performance scales with high memory throughput, and you can’t assemble that by wiring together off the shelf parts in a consumer form factor.

It’s simply not possible to DIY a homelab inference server better than the M3+ for inference workloads, at anywhere close to its price point.

They are perfectly positioned to capitalize on the next few years of model architecture developments. No wonder they haven’t bothered working on their own foundation models… they can let the rest of the industry do their work for them, and by the time their Gemini licensing deal expires, they’ll have their pick of the best models to embed with their hardware.


Replies

whywhywhywhytoday at 10:04 AM

> But inference is unique because its performance scales with high memory throughput, and you can’t assemble that by wiring together off the shelf parts in a consumer form factor.

Nvidia outperforms Mac significantly on diffusion inference and many other forms. It’s not as simple as the current Mac chips are entirely better for this.

show 3 replies
dragonwritertoday at 10:22 AM

> Apple really stumbled into making the perfect hardware for home inference machines

For LLMs. For inference with other kinds of models where the amount of compute needed relative to the amount of data transfer needed is higher, Apple is less ideal and systems worh lower memory bandwidth but more FLOPS shine. And if things like Google’s TurboQuant work out for efficient kv-cache quantization, Apple could lose a lot of that edge for LLM inference, too, since that would reduce the amount of data shuffling relative to compute for LLM inference.

HerbManictoday at 4:53 AM

Jeff Geerling doing that 1.5TB cluster using 4 Mac Studios was pretty much all the proof needed to demo how the Mac Pro is struggling to find any place any more.

https://www.jeffgeerling.com/blog/2025/15-tb-vram-on-mac-stu...

show 2 replies
robotswantdatatoday at 7:24 AM

DGX workstations, expensive but allow PCI cards as well.

https://marketplace.nvidia.com/en-us/enterprise/personal-ai-...

show 2 replies
spacedcowboytoday at 8:23 AM

Agreed. I’m planning on selling my 512GB M3 Ultra Studio in the next week or so (I just wrenched my back so I’m on bed-rest for the next few days) with an eye to funding the M5 Ultra Studio when it’s announced at WWDC.

I can live without the RAM for a couple of months to get a good price for it, especially since Apple don’t sell that model (with the RAM) any more.

show 1 reply
port11today at 8:23 AM

As to better or cheaper homelab: depends on the build. AMD AI Max builds do exist, and they also use unified memory. I could argue the competition was, for a long time, selling much more affordable RAM, so you could get a better build outside Apple Silicon.

tannhaeusertoday at 6:46 AM

For LLMs and other pure memory-bound workloads, but for eg. diffusion models their FPU SIMD performance is lacking.

fookertoday at 9:34 AM

The typical inference workloads have moved quite a bit in the last six months or so.

Your point would have been largely correct in the first half of 2025.

Now, you're going to have a much better experience with a couple of Nvidia GPUs.

This is because of two reasons - the reasoning models require a pretty high number of tokens per second to do anything useful. And we are seeing small quantized and distilled reasoning models working almost as well as the ones needing terabytes of memory.

hermanzegermantoday at 9:25 AM

Framework offers the AI Ryzen Max with ̶1̶9̶6̶G̶B̶ 128GB of unified RAM for 2,699$

That's a pretty good deal I would think

https://frame.work/de/de/products/desktop-diy-amd-aimax300/c...

show 3 replies
DeathArrowtoday at 8:08 AM

Still, running 2 to 4 5090 will beat anything Apple has to offer for both inference and training.

show 1 reply
rubyn00bietoday at 5:22 AM

I don't think Apple just stumbled into it, and while I totally agree that Apple is killing it with their unified memory, I think we're going to see a pivot from NVidia and AMD. The biggest reason, I think, is: OpenAI has committed to enormous amount capex it simply cannot afford. It does not have the lead it once did, and most end-users simply do not care. There are no network effects. Anthropic at this point has completely consumed, as far as I can tell, the developer market. The one market that is actually passionate about AI. That's largely due to huge advantage of the developer space being, end users cannot tell if an "AI" coded it or a human did. That's not true for almost every other application of AI at this point.

If the OpenAI domino falls, and I'd be happy to admit if I'm wrong, we're going to see a near catastrophic drop in prices for RAM and demand by the hyperscalers to well... scale. That massive drop will be completely and utterly OpenAI's fault for attempting to bite off more than it can chew. In order to shore up demand, we'll see NVidia and AMD start selling directly to consumers. We, developers, are consumers and drive demand at the enterprises we work for based on what keeps us both engaged and productive... the end result being: the ol' profit flywheel spinning.

Both NVidia and AMD are capable of building GPUs that absolutely wreck Apple's best. A huge reason for this is Apple needs unified memory to keep their money maker (laptops) profitable and performant; and while, it helps their profitability it also forces them into less performant solutions. If NVidia dropped a 128GB GPU with GDDR7 at $4k-- absolutely no one would be looking for a Mac for inference. My 5090 is unbelievably fast at inference even if it can't load gigantic models, and quite frankly the 6-bit quantized versions of Qwen 3.5 are fantastic, but if it could load larger open weight models I wouldn't even bother checking Apple's pricing page.

tldr; competition is as stiff as it is vicious-- Apple's "lead" in inference is only because NVidia and AMD are raking in cash selling to hyperscalers. If that cash cow goes tits up, there's no reason to assume NVidia and AMD won't definitively pull the the rug out from Apple.

show 5 replies