If you are curious about doing something similar with TPU, Google has an article. https://developers.googleblog.com/train-gpt2-model-with-jax-...