logoalt Hacker News

Chomsky and the Two Cultures of Statistical Learning (2011)

93 pointsby atomicnaturelast Tuesday at 9:33 AM74 commentsview on HN

Comments

MoravecsParadoxtoday at 5:01 PM

> derided researchers in machine learning who use purely statistical methods to produce behavior that mimics something in the world, but who don't try to understand the meaning of that behavior.

It's crazy how wrong Chomsky was about machine learning. Maybe the real truth is that humans are stochastic parrots who have an underlying probability distribution - and because gradient descent is so good at reproducing probability distributions - LLMs are incredibly good at reproducing language.

show 1 reply
intalentivetoday at 3:34 AM

This essay is missing the words “cause” and “causal”. There is a difference between discovering causes and fitting curves. The search for causes guides the design of experiments, and with luck, the derivation of formulae that describe the causes. Norvig seems to be confusing the map (data, models) for the territory (causal reality).

show 2 replies
barrenkotoday at 5:36 AM

Is this bayesian vs. frequentist?

show 1 reply
codeuliketoday at 9:12 AM

(this is from 2017)

show 2 replies
pmkarytoday at 9:45 AM

I have many books from Chomsky, and I want to throw them away because it disgusts me to have them. Then I think, why should I throw away things I spent so much on? It makes me more angry. So I have pilled them up somewhere to figure out what ti do with them and each time I walk past it I feel sad to ever passed by his work.

show 2 replies
bo1024today at 4:08 AM

Is this essay from 2011?

show 1 reply
scottoreilytoday at 1:10 PM

[dead]

cubefoxtoday at 9:51 AM

(2011)

show 1 reply
oldpersonintx2today at 4:22 PM

[dead]

templar_snowtoday at 2:50 AM

[flagged]

show 6 replies
ur-whaletoday at 9:02 AM

[flagged]

tripletaotoday at 5:26 AM

Here's Chomsky quoted in the article, from 1969:

> But it must be recognized that the notion of "probability of a sentence" is an entirely useless one, under any known interpretation of this term.

He was impressively early to the concept, but I think even those skeptical of the ultimate value of LLMs must agree that his position has aged terribly. That seems to have been a fundamental theoretical failing rather than the computational limits of the time, if he couldn't imagine any framework in which a novel sentence had probability other than zero.

I guess that position hasn't aged worse than his judgment of the Khmer Rouge (or Hugo Chavez, or Epstein, or ...) though. There's a cult of personality around Chomsky that's in no way justified by any scientific, political, or other achievements that I can see.

show 4 replies