Is it me or it feels like the "empirical argument" is correct (it is an observation after all), but the "theoretical argument" wildly off?
My understanding is that different levels of cache and memory are implemented pretty differently to optimize for density/speed. As in, this scaling is not the result of some natural geometric law, but rather because it was designed this way by the designers for those chips to serve the expected workloads. Some chips, like the mainframe CPUs by IBM have huge caches, which might not follow the same scaling.
I'm no performance expert, but this struck me as odd.
How you gonna pack bits onto a physical chip except to put them into a cube? What’s the longest path in a cube? What’s the average path length in a cube? They’re all functions of the surface area of the cube.
The theoretical argument seems sound, but it does ignore there are massive constant factors in current implementations beyond just the theoretical limit alone (particularly cost and heat) and skips directly explaining why those end up having similar growth rates.
The actual formula used (at the bottom of the ChatGPT screenshot) includes corrections for some of these factors, without them it'd have the right growth rate but yield nonsense.