logoalt Hacker News

plasticeagleyesterday at 8:45 PM0 repliesview on HN

I have had conversations at work, with people who I have reason to believe are smart and critical, in which they made the claim that humans and AI basically learn in the same way. My response to them, as to anyone that makes this claim, is that the amount of data ingested by someone with severe sensory dysfunction of one sort or another is very small. Helen Keller is the obvious extreme example, but even a person who is simply blind is limited to the bandwidth of their hearing.

And yet, nobody would argue that a blind person is any less intelligent that a sighted person. And so the amount of data a human ingests is not correlated with intelligence. Intelligence is something else.

When LLMs were first proposed as useful tools for examining data and proving answers to questions, I wondered to myself how they would solve the problem of there being no a-priori knowledge of truth in the models. How they would find a way of sifting their terabytes of training data so that the models learnt only true things.

Imagine my surprise that not only did they not attempt to do this, but most people did not appear to understand that this was a fundamental and unsolvable problem at the heart of every LLM that exists anywhere. That LLMs, without this knowledge, are just random answer generators. Many, many years ago I wrote a fun little Markov-chain generator I called "Talkback", that you could feed a short story to and then have a chat with. It enjoyed brief popularity at the University I attended, you could ask it questions and it would sort-of answer. Nobody, least of all myself, imagined that the essential unachievable idea - "feed in enough text and it'll become human" - would actually be a real idea in real people's heads.

This part of your answer though;

"My paper and pen version of the latest LLM .... My paper and pen version of the latest LLM"

Is just a variation of the Chinese Room argument, and I don't think it holds water by itself. It's not that it's just an algorithm, it's that learning anything usefully correct from the entire corpus of human literary output by itself is fundamentally impossible.