Extremely well said. Universal approximation is necessary but not sufficient for the performance we are seeing. The secret sauce is implicit regularization, which comes about analogously to enforcing compression.