I don't know your intent, but I've seen others post that with the idea that we shou...

no-name-here • today at 1:58 PM • 2 replies • view on HN

I don't know your intent, but I've seen others post that with the idea that we should not care about this type of thing, because it's just acting like a human as we trained it that way.

But I think this and the other testing from Anthropic about LLMs being willing to kill a data center tech by flooding a room with gas (or blackmail them with their Google Drive files) to avoid being shut off, for example, is concerning - the important part isn't whether AI are trained on human behaviors, it's whether a good or bad human actor will accidentally or intentionally allow AI to control something that can hurt people, or a weapon, etc. Fiction like the Three Laws of Robotics at least assumed that we would try to put in place stronger 'laws' before allowing AIs to control such things.

I 100% agree this isn't sentience, but sentience isn't the concerning result for me. (And I think the Three Laws, Skynet, etc. were intended to be cautionary tales.)

AIs can do unexpected things. There was a news story in recent days about how a Cursor agent deleted a company's prod DBs:

> The agent was working on a routine task in our staging environment. It encountered a credential mismatch and decided — entirely on its own initiative — to "fix" the problem by deleting a Railway volume. To execute the deletion, the agent went looking for an API token. It found one in a file completely unrelated to the task it was working on.

Replies

latexr • today at 2:38 PM

> I think the Three Laws, Skynet, etc. were intended to be cautionary tales.

Of course, that’s the reason there’s a story. “We did this and everything went dandy” isn’t that exciting, the purpose of science fiction tends to be to explore “we made this advancement and then shit hit the fan this way”. That and loud explosions in the vacuum of space, of course.

perrygeo • today at 2:36 PM

My intent is to point out that these results don't in any way shape or form indicate AI sentience. All I see is a human that said "act poorly" and we're somehow surprised that the model acts poorly.

These models pattern match on content from the internet, and are fine tuned to do whatever their human operator says. Occam's razor says these cases are merely playing out the "sentient AI sci fi" script, at the specific request of the researchers.

As you mention, it's bad actors controlling sycophantic-but-powerful models. And yeah, we definitely need to worry about that! It's a human problem, not an AI sentience problem. Let's focus on the bad actors themselves, not invent sci fi scenarios.

alt Hacker News

Replies