I was baffled by the comparison to the M4 Max. Does this mean that recent AMD chips will be perform...

Kirth • last Sunday at 8:26 AM • 7 replies • view on HN

I was baffled by the comparison to the M4 Max. Does this mean that recent AMD chips will be performing at the same level, and what does that mean for on-device LLMs? .. or am I misunderstanding this whole ordeal?

Replies

izacus • last Sunday at 9:23 AM

Yes, the Strix series of AMD uses a similar architecture as M series with massive memory bandwidth and big caches.

That results in significantly better performance.

➕ show 2 replies

cdavid • last Sunday at 9:45 AM

I was surprised at previous comparison on omarchy website, because apple m* work really well for data science work that don't require GPU.

It may be explained by integer vs float performance, though I am too lazy to investigate. A weak data point, using a matrix product of N=6000 matrix by itself on numpy:

  - SER 8 8745, linux: 280 ms -> 1.53 Tflops (single prec)
  - my m2 macbook air: it is ~180ms ms -> ~2.4 Tflops (single prec)

This is 2 mins of benchmarking on the computers I have. It is not apple to orange comparison (e.g. I use the numpy default blas on each platform), but not completely irrelevant to what people will do w/o much effort. And floating point is what matters for LLM, not integer computation (which is what the ruby test suite is most likely bottlenecked by)

➕ show 2 replies

Aurornis • last Sunday at 8:52 PM

An M4 Max has double the memory bandwidth and should run away with similarly optimized benchmarks.

An M4 Pro is the more appropriate comparison. I don't know why he's doing price comparisons to a Mac Studio when you can get a 64GB M4 Pro Mac Mini (the closest price/performance comparison point) for much less.

➕ show 1 reply

biehl • last Sunday at 9:23 AM

I think DHH compares them because they are both the latest, top-line chips. I think DHHs benchmarks show that they have different performance characteristics. But DHHs favorite benchmark favors whatever runs native linux and docker.

For local LLM the higher memory bandwith of M4 Max makes it much more performant.

Arstechnica has more benchmarks for non-llm things https://arstechnica.com/gadgets/2025/08/review-framework-des...

➕ show 1 reply

discordance • last Sunday at 1:24 PM

Not in perf/watt but perf, yes.

➕ show 1 reply

ekianjo • last Sunday at 11:11 PM

macs have faster memory access so No, Macs are faster for llms

pengaru • last Sunday at 3:57 PM

It's not baffling once you realize TSMC is the main defining factor for all these chips, Apple Silicon is simply not that special in the grand scheme of things.

Why do you think TSMC's production being in Taiwan is basically a national security issue for the U.S. at this point?

➕ show 2 replies

alt Hacker News

Replies