> Remat can produce a performance boost even when everything has a register.
Can you give an example?
Rematerializing 'safe' computation from across a barrier or thread sync/wait works wonders.
Also loads and stores and function calls, but that's a bit finicky to tune. We usually tell people to update their programs when this is needed.
Rematerializing 'safe' computation from across a barrier or thread sync/wait works wonders.
Also loads and stores and function calls, but that's a bit finicky to tune. We usually tell people to update their programs when this is needed.