But for ML workloads the comparison isn't between slotted CPU RAM and Apple's unified RAM, it's between Apple's unified RAM and dedicated GPU VRAM, which can more than double even the M3 Ultras bandwidth at up to 1.8TB/sec. Apple Silicon makes a unique set of trade-offs that shine in certain areas but they are still trade-offs nonetheless, so it really depends on what exactly you're doing with the hardware.
Dedicated GPU VRAM is much scarcer than the unified RAM you get on Mac platforms. This is a big deal for SOTA LLMs that combine high memory footprint with a need for high memory bandwidth in order to get acceptable performance.