The whole innovation of GPT and LLMs in general is that an autoregressive model can make alarmingly ...

nerdponx • 11/08/2024 • 1 reply • view on HN

The whole innovation of GPT and LLMs in general is that an autoregressive model can make alarmingly good next-token predictions with the right inductive bias, a large number of parameters, a long context window, and a huge training set.

It turns out that human communication is quite a lot more "autoregressive" than people assumed it was up until now. And that includes some level of reasoning capability, arising out of a kind of brute force pattern matching. It has limits, of course, but it's amazing that it works as well as it does.

Replies

HarHarVeryFunny • 11/10/2024

It is amazing, and interesting.

Although I used the word myself, I'm not sure that "autoregressive" is quite the right word to describe how LLMs work, or our brains. Maybe better to just call both "predictive". In both cases the predictive inputs include the sequence itself (or selected parts of it, at varying depths of representation), but also global knowledge, both factual and procedural (HOW to represent the sequence). In the case of our brain there are also many more inputs that may be used such as sensory ones (passive observations, or action feedback), emotional state, etc.

Regardless of what predictive inputs are available to LLMs vs brains, it does seem that in a lot of cases the more constrained inputs of an LLM don't prevent it from sounding very human like (not surprising at some level given the training goal), and an LLM chat window does create a "level playing field" (i.e. impoverished input setting for the human) where each side only sees the other as a stream of text. Maybe in this setting, the human, when not reasoning, really isn't bringing much more predictive machinery to the table than the LLM/transformer!

Notwithstanding the predictive nature of LLMs, I can't help but also see them just as expert systems of sorts, albeit ones that have derived their own rules (much pertaining to language) rather than being given them. This view better matches their nature as fixed repositories of knowledge, brittle where rules are missing, as opposed to something more brain-like and intelligent, capable of continual learning.

alt Hacker News

Replies