Obviously yes but NVIDIA Ampere/Hopper architecture has 64k 32-bit registers per SM. A100 has 108 SMs and H100 has 132 SMs so go figure - registers aren't a bottleneck.