Two handwavey ideas upon reading this: - Even for billion-parameter theories, a small amount of ve...

harperlee • today at 6:24 PM • 3 replies • view on HN

Two handwavey ideas upon reading this:

- Even for billion-parameter theories, a small amount of vectors might dominate the behaviour. A coordinate shift approach (PCA) might surface new concepts that enable us to model that phenomenon. "A change in perspective is worth 80 IQ points", said Alan Kay.

- There is analogue of how we come up with cognitive metaphors of the mind ("our models of the mind resemble our latest technology (abacus, mechanisms, computer, neural network)"), to be applied to other complicated areas of reality.

Replies

pash • today at 9:23 PM

> Even for billion-parameter theories, a small amount of vectors might dominate the behaviour.

We kinda-sorta already know this is true. The lottery-ticket hypothesis [0] says that every large network contains a randomly initialized small network that performs as well as the overall network, and over the past eight years or so researchers have indeed managed to find small networks inside large networks of many different architectures that demonstrate this phenomenon.

Nobody talks much about the lottery-ticket hypothesis these days because it isn’t practically useful at the moment. (With the pruning algorithms and hardware we have, pruning is more costly than just training a big network.) But the basic idea does suggest that there may be hope for interpretability, at least in the odd application here or there.

That is, the lottery-ticket hypothesis suggests that the training process is a search through a large parameter space for a small network that already (by random initialization) exhibit the overall desired network behavior; updating parameters during the training process is mostly about turning off the irrelevant parts of the network.

For some applications, one would think that the small sub-network hiding in there somewhere might be small enough to be interpretable. In particular, I would not be surprised if investigating neural networks does some day not too far into the future start to yield good interpretable models of phenomena of intermediate complexity (those phenomena that are too complex to be amenable to classic scientific techniques but simple enough that a neural network yields an unusually small active sub-network).

0. https://en.wikipedia.org/wiki/Lottery_ticket_hypothesis

aldousd666 • today at 8:07 PM

I don't disagree, but neither does the article. It's just talking about the fact that we previously considered anything that can't be easily and tersely written down as nearly or entirely intractable. But, as we have seen, the three body problem is not really a hum-dinger as far as the universe goes, it's not even table stakes. We need to be able to do the same kind of energy arbitrage on n-body problems that we do on 2. And now we have the beginnings of a place to toy with more complicated ideas -- since these won't fit on a blackboard.

➕ show 1 reply

simianwords • today at 7:02 PM

Maybe we can come up with smaller models that perform almost as well as the bigger ones. Could that just be pca of some kind?

Gpt nano vs gpt 5 for example.

alt Hacker News

Replies