My bet is that the amount of work needed per token generated will decrease over time and the models will become smaller for the same performance as we learn to optimize so cost and needed hardware will go down