For example, an adder's total delay depends on a carry chain. If you have N 4-bit slices, the last slice has to wait for the carry to propagate through all N-1 previous slices.
But if you duplicate all your slices, you can have the results for both carry = 0 and carry = 1 inputs. Then just switch which one is correct - total time 1 add plus N-1 switches.
Just for double (and change) the hardware. Cheap.