>research from Anthropic [1] suggests that structures corresponding to meaning exist inside those...

CGMthrowaway • today at 1:43 AM • 1 reply • view on HN

>research from Anthropic [1] suggests that structures corresponding to meaning exist inside those bundles of numbers and that there are signs of activity within those bundles of numbers that seem analogous to thought.

Can you give some concrete examples? The link you provided is kind of opaque

>Amanda Askell [2]. When she talks, she anthropomorphizes LLMs constantly.

She is a philosopher by trade and she describes her job (model alignment) as literally to ensure models "have good character traits." I imagine that explains a lot

Replies

tkgally • today at 1:51 AM

Here are three of the Anthropic research reports I had in mind:

https://www.anthropic.com/news/golden-gate-claude

Excerpt: “We found that there’s a specific combination of neurons in Claude’s neural network that activates when it encounters a mention (or a picture) of this most famous San Francisco landmark.”

https://www.anthropic.com/research/tracing-thoughts-language...

Excerpt: “Recent research on smaller models has shown hints of shared grammatical mechanisms across languages. We investigate this by asking Claude for the ‘opposite of small’ across different languages, and find that the same core features for the concepts of smallness and oppositeness activate, and trigger a concept of largeness, which gets translated out into the language of the question.”

https://www.anthropic.com/research/introspection

Excerpt: “Our new research provides evidence for some degree of introspective awareness in our current Claude models, as well as a degree of control over their own internal states.”

alt Hacker News

Replies