In this case the “useless” work is the cost of moving and distributing the threads between different compute clusters. That cost is nonzero, and does needs to be factored in, but it’s also more than overwhelmed by the benefits gained from doing the move.