the article offers a simplified world model: Poisson arrivals and infinite queue, which is fine as a math model.
In the real world however, the bursts can be correlated, due to factors like timeouts/retries, thundering herd, correlated bursts.
so the real economics of load-balanced system is a simple reliability story: being able to reasonably serve the peak traffic, which leads to over-provisioning of those systems.
using cloud allows some form of scale up/down of resources, but doesn't completely solve the problem. I think the migration away from synchronyous systems towards async systems and letting clients gradually absorb the delays is a better approach (rather than forcing infrastructure to be dynamically scaled up/down and be billed per request-second by your cloud provider)
>In the real world however, the bursts can be correlated
Very true, as application-layer load-balancing often explicitly pre-bakes the traffic schedule to several hundred distributed IPs for data locality. Essentially bypassing the functional need for DNS and local round-robin traffic balancers.
One trades concurrent bandwidth for slightly higher latency, and dynamically adapted capacity as traffic load changes. =3
Another technique to add to the mix if you can handle the additional complexity is to load or feature shed. If you can delay or just drop additional expensive application features during the exact time you need to scale or handle a burst, then your system has additional core app logic to handle requests. This can prevent the system getting wedged in a positive feedback loop.
See also the gamedev technique of having sacrificial assets or code, so when you need to free up space late in the schedule to ship, you have something you can actually shed.