The python overhead of launching big ML jobs is nontrivial, so I think speeding that up would be meaningful. (I mean the initial tracing and other setup, not things once the GPUs are actually doing the work).