In the register file or named registers?
And the critical matrix tiling size is often SRAM, so L3 unified cache.