This essay is missing the words “cause” and “causal”. There is a difference between discovering causes and fitting curves. The search for causes guides the design of experiments, and with luck, the derivation of formulae that describe the causes. Norvig seems to be confusing the map (data, models) for the territory (causal reality).
I had this exact reaction, no discussion of "causal modeling" makes the whole thing seem horribly out of touch with the real issues here. You can have explanatory and predictive models that are causal models, or explanatory and predictive models that are non-causal, and that this the actual issue, not "explanation" vs. "prediction", which is not a tight enough distinction.
This essay frequently uses the word "insight", and its primary topic is whether an empirically fitted statistical model can provide that (with Norvig arguing for yes, in my opinion convincingly). How does that differ from your concept of a "cause"?
A related* essay (2010) by a statistician on the goals of statistical modelling that I've been procrastinating on:
https://www.stat.berkeley.edu/~aldous/157/Papers/shmueli.pdf
To Explain Or To Predict?
Nice quote
We note that the practice in applied research of concluding that a model with a higher predictive validity is “truer,” is not a valid inference. This paper shows that a parsimonious but less true model can have a higher predictive validity than a truer but less parsimonious model.
Hagerty+Srinivasan (1991)
*like TFA it's a sorta review of Breiman