> top-tier GPU time is just so valuable that they always want to be training full speed ahead.
I don't think this makes much sense because the "waste" of hardware infrastructure by going from 99.999% duty cycle to 99% is still only ~1%. It's linear in the fraction of forgone capacity, while the fraction of power costs you save from simply shaving off the costliest peaks and shifting that demand to the lows is superlinear.