Waiting 15+ seconds to test small changes to my PyTorch training code on NFS is rather annoying. I know there are ways to work around it, but sometimes I wish we could have a training workflow similar to how Revise works. Make changes to the code, Revise patches it, then run it via a REPL on the main node. Not sure if Revise actually works in a distributed context, but that would be amazing if it did. No need to start/fork a million new Python processes every single time.
Of course I would also rather be doing all of the above in Julia instead of Python ;)
Revise can work on your server for hot reloading if you need it - you copy your new code files in place over the old ones.
Of course there are caveats - it won't update actively running code, but if your code it's structured reasonably and you are aware of Revise's API and the very basics of Julia's world age you can do it pretty easily IME.