logoalt Hacker News

tekacstoday at 8:11 PM1 replyview on HN

"We want to see risks in the models, so no matter how good the performance and alignment, we’ll see risks, results and reality be damned."


Replies

randomcatusertoday at 8:54 PM

i mean, to be fair, these are professional researchers.

i'm very inclined to trust them on the various ways that models can subtly go wrong, in long-term scenarios

for example, consider using models to write email -- is it a misalignment problem if the model is just too good at writing marketing emails?? or too good at getting people to pay a spammy company?

another hot use case: biohacking. if a model is used to do really hardcore synthetic chemistry, one might not realize that it's potentially harmful until too late (ie, the human is splitting up a problem so that no guardrails are triggered)