> I've seen claims that the M5 Max is roughly equivalent to ~4000 CUDA cores
Who claimed that? The M5 is still a raster focused GPU, dedicated matmul blocks be damned. For some workloads that napkin math might work out, but for many others it's a wild overshoot. Time-to-first-token still favors CUDA, and real-world training workloads aren't getting anywhere near Apple Silicon.
All of the memory bandwidth in the world is useless if you spend 15 minutes processing 64k tokens worth of context prefill. This is where CUDA shines.