Training energy is amortized across the lifespan of a model. For any given query for the most popular commercial models, your share of the energy used to train it is a small fraction of the energy used for inference (e.g. 10%).