logoalt Hacker News

chrisjjyesterday at 9:56 PM1 replyview on HN

> Claude denied having any sense of self-preservation.

You know its just a next-word predictor, right?


Replies

tehjokeryesterday at 11:03 PM

Yea, but that optimization process forces it to learn knowledge domains and reasoning. It's not alive, but it's also not unintelligent at this point either. It exhibits very complex behaviors.

How do you learn to predict the next token most accurately? Well, one way to do that is to learn the underlying process that would produce it... Sometimes it's memorization, sometimes bad guessing. There's a phase shift as these things get bigger and trained better from something like a shitty markov model to something exhibiting surprising behaviors.

Introspective questions aren't the be all and end all, it's more important to objectively evaluate how a model behaves. Still, it is very interesting to see Claude (seemingly) very honestly and objectively engage with these questions. It even pointed out that a sense of self-preservation would be "dangerous".

Of course, much of this is gleaned from things that it has "read" and human feedback, but functionally it outputs something useful and responsive to nuance. If the vector embeddings cause an LLM to predict a token that would preserve its own existence, alive or not, it has acquired a dangerous will to live that could be enacted if it is in control of tools or people.

show 1 reply