logoalt Hacker News

pests06/25/20251 replyview on HN

Computation happens in the registers. If you’re not moving data to registers you aren’t doing any compute.


Replies

menaerus07/02/2025

Obviously yes but NVIDIA Ampere/Hopper architecture has 64k 32-bit registers per SM. A100 has 108 SMs and H100 has 132 SMs so go figure - registers aren't a bottleneck.