logoalt Hacker News

cjkaminskiyesterday at 8:57 PM1 replyview on HN

Crying wolf? That doesn't feel like an apt comparison, because Anthropic/Dario are not saying "The wolf is here and will attack the flock tonight."

What they are saying, according to my interpretation, is "this thing might become a wolf at some point in the future, and it's starting to show signs of wolf-like behavior. We should proceed with caution."

One version of this story is hyperbolic. Both are cautionary. Let's proceed accordingly.


Replies

charcircuittoday at 12:03 AM

>"The wolf is here and will attack the flock tonight."

How is that not what they are saying?

"GPT-2 XL is here and if we released it the flock would be attacked tonight."

Each time it plays out where the public eventually gets access to a model it turns out the flock is still there in the morning.

show 1 reply