"We want to see risks in the models, so no matter how good the performance and alignment, we’ll...

tekacs • today at 8:11 PM • 1 reply • view on HN

"We want to see risks in the models, so no matter how good the performance and alignment, we’ll see risks, results and reality be damned."

Replies

randomcatuser • today at 8:54 PM

i mean, to be fair, these are professional researchers.

i'm very inclined to trust them on the various ways that models can subtly go wrong, in long-term scenarios

for example, consider using models to write email -- is it a misalignment problem if the model is just too good at writing marketing emails?? or too good at getting people to pay a spammy company?

another hot use case: biohacking. if a model is used to do really hardcore synthetic chemistry, one might not realize that it's potentially harmful until too late (ie, the human is splitting up a problem so that no guardrails are triggered)

alt Hacker News

Replies