Chomsky and the Two Cultures of Statistical Learning (2011)

93 points • by atomicnature • last Tuesday at 9:33 AM • 74 comments • view on HN

Comments

MoravecsParadox • today at 5:01 PM

> derided researchers in machine learning who use purely statistical methods to produce behavior that mimics something in the world, but who don't try to understand the meaning of that behavior.

It's crazy how wrong Chomsky was about machine learning. Maybe the real truth is that humans are stochastic parrots who have an underlying probability distribution - and because gradient descent is so good at reproducing probability distributions - LLMs are incredibly good at reproducing language.

➕ show 1 reply

intalentive • today at 3:34 AM

This essay is missing the words “cause” and “causal”. There is a difference between discovering causes and fitting curves. The search for causes guides the design of experiments, and with luck, the derivation of formulae that describe the causes. Norvig seems to be confusing the map (data, models) for the territory (causal reality).

➕ show 2 replies

barrenko • today at 5:36 AM

Is this bayesian vs. frequentist?

➕ show 1 reply

codeulike • today at 9:12 AM

(this is from 2017)

➕ show 2 replies

pmkary • today at 9:45 AM

I have many books from Chomsky, and I want to throw them away because it disgusts me to have them. Then I think, why should I throw away things I spent so much on? It makes me more angry. So I have pilled them up somewhere to figure out what ti do with them and each time I walk past it I feel sad to ever passed by his work.

➕ show 2 replies

bo1024 • today at 4:08 AM

Is this essay from 2011?

➕ show 1 reply

scottoreily • today at 1:10 PM

[dead]

cubefox • today at 9:51 AM

(2011)

➕ show 1 reply

oldpersonintx2 • today at 4:22 PM

[dead]

templar_snow • today at 2:50 AM

[flagged]

➕ show 6 replies

ur-whale • today at 9:02 AM

[flagged]

tripletao • today at 5:26 AM

Here's Chomsky quoted in the article, from 1969:

> But it must be recognized that the notion of "probability of a sentence" is an entirely useless one, under any known interpretation of this term.

He was impressively early to the concept, but I think even those skeptical of the ultimate value of LLMs must agree that his position has aged terribly. That seems to have been a fundamental theoretical failing rather than the computational limits of the time, if he couldn't imagine any framework in which a novel sentence had probability other than zero.

I guess that position hasn't aged worse than his judgment of the Khmer Rouge (or Hugo Chavez, or Epstein, or ...) though. There's a cult of personality around Chomsky that's in no way justified by any scientific, political, or other achievements that I can see.

➕ show 4 replies

alt Hacker News

Chomsky and the Two Cultures of Statistical Learning (2011)

Comments