Fair enough, but that's then only absolutely max 1024 threads per SM, which wouldn't get anywhere near 1 million, given 5090 only has 192 SMs...
Future proofing I guess...
Just like 2 threads can execute on the same core at the "same" time, i.e. no synchronization, the same is true for GPU threads/ thread groups.
I guess they never say that they execute at the same time technically haha
You can launch much more logical threads than the available physical threads. The GPU scheduler will automatically dispatch the work to the SMs.