This comparison only works if you assume scaling keeps paying off. Sara Hooker's research shows (https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5877662) compact models now outperform massive predecessors and scaling laws only predict pre-training loss, not downstream performance. If marginal returns on compute are falling (https://philippdubach.com/posts/the-most-expensive-assumptio...), "energy per query" hides the real problem, a trillion dollars of infrastructure built on the bet that they won't.