logoalt Hacker News

tkgallytoday at 1:31 AM8 repliesview on HN

I am unsure myself whether we should regard LLMs as mere token-predicting automatons or as some new kind of incipient intelligence. Despite their origins as statistical parrots, the interpretability research from Anthropic [1] suggests that structures corresponding to meaning do exist inside those bundles of numbers and that there are signs of activity within those bundles of numbers that seem analogous to thought.

That said, I was struck by a recent interview with Anthropic’s Amanda Askell [2]. When she talks, she anthropomorphizes LLMs constantly. A few examples:

“I don't have all the answers of how should models feel about past model deprecation, about their own identity, but I do want to try and help models figure that out and then to at least know that we care about it and are thinking about it.”

“If you go into the depths of the model and you find some deep-seated insecurity, then that's really valuable.”

“... that could lead to models almost feeling afraid that they're gonna do the wrong thing or are very self-critical or feeling like humans are going to behave negatively towards them.”

[1] https://www.anthropic.com/research/team/interpretability

[2] https://youtu.be/I9aGC6Ui3eE


Replies

Kim_Bruningtoday at 2:06 AM

Amanda Askell studied under David Chalmers at NYU: the philosopher who coined "the hard problem of consciousness" and is famous for taking phenomenal experience seriously rather than explaining it away. That context makes her choice to speak this way more striking: this isn't naive anthropomorphizing from someone unfamiliar with the debates. It's someone trained by one of the most rigorous philosophers of consciousness, who knows all the arguments for dismissing mental states in non-biological systems, and is still choosing to speak carefully about models potentially having something like feelings or insecurities.

show 1 reply
CGMthrowawaytoday at 1:43 AM

>research from Anthropic [1] suggests that structures corresponding to meaning exist inside those bundles of numbers and that there are signs of activity within those bundles of numbers that seem analogous to thought.

Can you give some concrete examples? The link you provided is kind of opaque

>Amanda Askell [2]. When she talks, she anthropomorphizes LLMs constantly.

She is a philosopher by trade and she describes her job (model alignment) as literally to ensure models "have good character traits." I imagine that explains a lot

show 1 reply
andaitoday at 2:26 AM

Well, she's describing the system's behavior.

My fridge happily reads inputs without consciousness, has goals and takes decisions without "thinking", and consistently takes action to achieve those goals. (And it's not even a smart fridge! It's the one with a copper coil or whatever.)

I guess the cybernetic language might be less triggering here (talking about systems and measurements and control) but it's basically the same underlying principles. One is just "human flavored" and I therefore more prone to invite unhelpful lines of thinking?

Except that the "fridge" in this case is specifically and explicitly designed to emulate human behavior so... you would indeed expect to find structures corresponding to the patterns it's been designed to simulate.

Wondering if it's internalized any other human-like tendencies — having been explicitly trained to simulate the mechanisms that produced all human text — doesn't seem too unreasonable to me.

visargatoday at 5:19 AM

> the interpretability research from Anthropic [1] suggests that structures corresponding to meaning do exist inside those bundles of numbers and that there are signs of activity within those bundles of numbers that seem analogous to thought

I did a simple experiment - took a photo of my kid in the park, showed it to Gemini and asked for a "detailed description". Then I took that description and put it into a generative model (Z-Image-Turbo, a new one). The output image was almost identical.

So one model converted image to text, the other reversed the processs. The photo was completely new, personal, never put online. So it was not in any training set. How did these 2 models do it if not actually using language like a thinking agent?

https://pbs.twimg.com/media/G7gTuf8WkAAGxRr?format=jpg&name=...

show 1 reply
jimbokuntoday at 4:43 AM

Wow those quotes are extremely disturbing.

Manfredtoday at 7:54 AM

This argument would have a lot more weight if it was published in a peer reviewed journal by a party that does not have a stake in the AI market.

electroglyphtoday at 1:45 AM

the anthropomorphization (say that 3 times quickly) is kinda weird, but also makes for a much more pleasant conversation imo. it's kinda tedious being pedantic all the time.

show 2 replies
bamboozledtoday at 2:17 AM

I use LLMs heavily for work, I have done so for about 6 months. I see almost zero "thought" going on and a LOT of pattern matching. You can use this knowledge to your advantage if you understand this. If you're relying on it to "think", disaster will ensue. At least that's been my experience.

I've completely given up on using LLMs for anything more than a typing assistant / translator and maybe an encyclopedia when I don't care about correctness.