The tradeoff is that the M3 Ultra's GPU loses to laptop GPUs in compute benchmarks. All of that bandwidth is wasted idling for token prefill.
For inference workloads, it makes a lot more sense to optimize for prefill/ttft before maxing out memory bandwidth.