You'll want to look at benchmarks rather than the theoretical maximum bandwidth available to the system. Apple has been using bandwidth as a marketing point but you're not always able to use that bandwidth amount depending on your workload. For example, the M1 Max has 400GB/s advertised bandwidth but the CPU and GPU combined cannot utilize all of it [1]. This means Strix Halo could actually be better for LLM inference than Apple Silicon if it achieves better bandwidth utilization.
[1] https://web.archive.org/web/20250516041637/https://www.anand...