"Scaling up performance from M5 and offering the same breakthrough GPU architecture with a Neur...

Tangokat • today at 2:09 PM • 27 replies • view on HN

"Scaling up performance from M5 and offering the same breakthrough GPU architecture with a Neural Accelerator in each core, M5 Pro and M5 Max deliver up to 4x faster LLM prompt processing than M4 Pro and M4 Max, and up to 8x AI image generation than M1 Pro and M1 Max."

Are they doubling down on local LLMs then?

I still think Apple has a huge opportunity in privacy first LLMs but so far I'm not seeing much execution. Wondering if that will change with the overhaul of Siri this spring.

Replies

rafark • today at 7:17 PM

> Are they doubling down on local LLMs then?

I love the push to local llms. But it’s hilarious how apple a few years ago was so reluctant to even mention “AI” in its keynotes and fast forward a couple years they’ve fully embraced it. I mean I like that they embraced it rather than be “different” (stubborn) and stay behind the tech industry. It’s the smart choice. I just think it’s funny.

butILoveLife • today at 2:22 PM

I think its just marketing, and the marketing is working. Look how many people bought Minis and ended up just paying for API calls anyway. (Saw it IRL 2x, see it on reddit openclaw daily)

I don't mind it, I open Apple stock. But I'm def not buying into their rebranding of integrated GPU under the guise of Unified Memory.

➕ show 8 replies

whizzter • today at 2:16 PM

We had a workshop 6 months ago and while I've always been sceptical of OpenAI,etc's silly AGI/ASI claims, the investments have shown the way to a lot of new technology and has opened up a genie that won't be put back into the bottle.

Now extrapolating in line with how Sun servers around year 2000 cost a fortune and can be emulated by a 5$ VPS today, Apple is seeing that they can maybe grab the local LLM workloads if they act now with their integrated chip development.

But to grab that, they need developers to rely less on CUDA via Python or have other proper hardware support for those environments, and that won't happen without the hardware being there first and the machines being able to be built with enough memory (refreshing to see Apple support 128gb even if it'll probably bleed you dry).

➕ show 3 replies

woadwarrior01 • today at 3:33 PM

> Are they doubling down on local LLMs then?

Neural Accelerators (aka NAX) accelerates matmults with tile sizes >= 32. From a very high level perspective, LLM inference has two phases: (chunked) prefill and decode. The former is matmults (GEMM) and the latter is matrix vector mults (GEMV). Neural Accelerators make the former (prefill) faster and have no impact on the latter.

tiffanyh • today at 2:58 PM

> Are they doubling down on local LLMs then?

Apple is in the hardware business.

They want you to buy their hardware.

People using Cloud for compute is essentially competitive to their core business.

➕ show 1 reply

caycep • today at 5:22 PM

Given all the supply issues w/ Nvidia, I think Apple's AI strategy should be - local AI everything (not just LLMs), but also make Metal competitive w/ CUDA. Their ace in the hole is the unified memory model.

Lalabadie • today at 2:38 PM

There already are a bunch of task-specific models running on their devices, it makes sense to maintain and build capacity in that area.

I assume they have a moderate bet on on-device SLMs in addition to other ML models, but not much planned for LLMs, which at that scale, might be good as generalists but very poor at guaranteeing success for each specific minute tasks you want done.

In short: 8gb to store tens of very small and fast purpose-specific models is much better than a single 8gb LLM trying to do everything.

➕ show 2 replies

aurareturn • today at 2:20 PM

  Are they doubling down on local LLMs then?

Neural Accelerator was present in iPhone 17 and M5 chip already. This is not new for M5 Pro/Max.

Apple's stated AI strategy is local where it can and cloud where it needs. So "doubling down"? Probably not. But it fits in their strategy.

Aurornis • today at 2:27 PM

The hardware capabilities that make local LLMs fast are useful for a lot of different AI workloads. Local LLMs are a hot topic right now so that’s what the marketing team is using as an example to make it relatable.

Someone1234 • today at 2:39 PM

Apple's AI strategy really kind of threads the needle cleverly.

"AI" (LLMs) may or may not have a bubble-pop moment, but until it does Apple get to ride it on these press releases and claims. But if the big-pop occurs, then Apple winds up with really fantastic hardware that just happens to be good at AI workloads (as well as general computing).

For example, image classification (e.g. face recognition/photo tagging), ASR+vocoders, image enhancement, OCR, et al, were popular before the current boom, and will likely remain popular after. Even if LLM usage dries up/falls out of vogue, this hardware still offers a significant user benefit.

➕ show 2 replies

ivankra • today at 2:32 PM

But memory bandwidth (bottleneck for LLM inference) is only marginally improved, 614 GB/s vs 546 GB/s for M4/M5 Max - where is this 4x improvement coming from?

I think I'll pass on upgrading.

➕ show 2 replies

Sharlin • today at 2:15 PM

"Apple Intelligence is even more capable while protecting users’ privacy at every step."

Remains to be seen how capable it actually is. But they're certainly trying to sell the privacy aspect.

➕ show 1 reply

game_the0ry • today at 2:27 PM

> Are they doubling down on local LLMs then?

Honestly, I think that's the move for apple. They do not seem to have any interest in creating a frontier lab/model -- why would they give the capex and how far behind they are.

But open source models (Kimi, Deepseek, Qwen) are getting better and better, and apple makes excellent hardware for local LLMs. How appealing would it be to have your own LLM that knows all your secrets and doesnt serve you ads/slop, versus OpenAI and SCam Altman having all your secrets? I would seriously consider it even if the performance was not quite there. And no need for subscription + cli tool.

I think apple is in the best position to have native AI, versus the competition which end up being edge nodes for the big 4 frontier labs.

➕ show 1 reply

maherbeg • today at 5:38 PM

Honestly, they can keep waiting for another year or two for on-device models at the size they're looking for to be powerful enough.

blueTiger33 • today at 4:03 PM

have you seen that github repo where they unlock the true power of NE?

➕ show 1 reply

icar • today at 3:07 PM

Didn't they announce a partnership with Google Gemini?

jmyeet • today at 2:41 PM

Apple absolutely has a massive opportunity here because they used a shared memory architecture.

So as most people in or adjacent to the AI space know, NVidia gatekeeps their best GPUs with the most memory by making them eye-wateringly expensive. It's a form of market segmentation. So consumer GPUs top out at 16GB (5090 currently) while the best AI GPUs (H200?) is 141GB (I just had to search)? I think the previou sgen was 80GB.

But these GPUs are north of $30k.

Now the Mac Studio tops out currently at 512GB os SHARED memory. That means you can potentially run a much larger model locally without distributing it across machines. Currently that retails at $9500 but that's relatively cheap, in comparison.

But, as it stands now, the best Apple chips have significantly lower memory bandwidth than NVidia GPUs and that really impacts tokens/second.

So I've been waiting to see if Apple will realize this and address it in the next generation of Mac Studios (and, to a lesser extend, Macbook Pros). The H200 seems to be 4.8TB/s. IIRC the 5090 is ~1.8TB/s. The best Apple is (IIRC) 819GB/s on the M3 Ultra.

Apple could really make a dent in NVidia's monopoly here if they address some of these technical limitations.

So I just checked the memory bandwidth of these new chips and it seems like the M5 is 153GB/s, M5 Pro is ~300 and M5 Max is ~600. I was hoping for higher. This isn't a big jump from the M4 generation. I suspect the new Studios will probably barely break 1TB/s. I had been hoping for higher.

➕ show 3 replies

andy_ppp • today at 2:28 PM

It is simply marketing nonsense - what they really mean (I think) is they support matrix multiplication (matmul) at the hardware level which given AI is mostly matrix multiplications you'll get much faster inference (and some increase in training too) on this new hardware. I'm looking forward to seeing how fast a local 96gb+ LLM is on the M5 Max with 128gb of RAM.

➕ show 1 reply

jahller • today at 2:14 PM

looks like this will be their angle for the whole agentic AI topic

general_reveal • today at 2:33 PM

It’s not necessarily doubling down on local. The reality is your LLM should be inferencing every tick … the same way your brain thinks every. Fucking. Nano. Second.

So yes, the LLM should be inferencing on your prompt, but it should also be inferencing on 25,000 other things … in parallel.

Those are the compute needs.

We just need compute everywhere as fast as possible.

kilroy123 • today at 2:27 PM

I've been so disappointed in Apple's lack of execution on this. There is so much potential for fantastic local models to run and intelligently connect to cloud models.

I just don't get why they're dropping the ball so much on this.

➕ show 1 reply

ignoramous • today at 3:15 PM

> doubling down on local LLMs

Do think it'll be common to see pros purchasing expensive PCs approaching £25k or more if they could run SoTA multi-modal LLMs faster & locally.

m3kw9 • today at 2:53 PM

A useful llm that needs 64gb of ram and mid double digit cores is not useful for 99% of their customers. The LLMs they have on iphone 17's certainly cannot do anything useful other than summerization and stuff. It's a hardware constraint that they have.

lakrici88284 • today at 2:56 PM

[dead]

lynx97 • today at 2:26 PM

The topic is MacBook, so my criticism is a little off. However, I really dont believe in this "local LLM" promise from Apple. My phone already gets noticeably warm if I answer 5 WhatsApp messages. And looses 5% of battery during the process. I highly doubt Apple will have a useable local LLM that doesn't drain my battery in minutes, before 2030.

➕ show 1 reply

meisel • today at 2:48 PM

What % of users actually care that much about local LLMs? It appears to still be an inferior (though maybe decent) service compared to ChatGPT etc., and requires very top-end hardware. Is privacy _that_ important to people when their Google search history has been a gateway to the soul for years? I wonder if these machines would cost significantly less (or put the cost to other things, e.g. more CPU cores) without this emphasis on LLMs.

➕ show 1 reply

neya • today at 3:05 PM

> I still think Apple has a huge opportunity in privacy first LLMs

This correlation of Apple and privacy needs to rest. They have consistently proven to be otherwise - despite heavily marketing themselves as "privacy-first"

https://www.theguardian.com/technology/2019/jul/26/apple-con...

➕ show 3 replies

alt Hacker News

Replies