What’s the current state of the art, for each of training locally and in the cloud, for learning my voice?
Local? No idea. Cloud? Eleven Labs, probably. But it's described as "cloning" not "training". Not sure what the distinction is or why it matters if the end result is you can to generate any TTS that sounds like you. There might very well be an important one, I just don't know it.
open weights i would say S2: https://github.com/rodrigomatta/s2.cpp
Locally maybe https://voicebox.sh/
Elevenlabs in the cloud.