I am starting to believe that OAI might actually succeed at getting per token inference cost to where it needs to be. Or that it's already there in principle.
Wafer scale compute is a very big deal. Most of HN is probably still unaware that you can get tokens out of one of these devices right now via public API offerings.