That's one of the big problems with any kind of AI alignment or safety work. Safety for whom?...

lambda • today at 12:13 AM • 1 reply • view on HN

That's one of the big problems with any kind of AI alignment or safety work.

Safety for whom? Alignment to whose needs?

And a lot of time, that's contextual. You don't necessarily want to be effortlessly crafting novel exploits for a ransomware attacker, but you do want to be able to create a PoC exploit when deciding the severity of a CVE.

Or one valid use of an LLM is to craft examples of various kinds of abuse for training some smaller simpler model as a classifier.

So yeah, in trying to create a general purpose tool and then applying some notion of alignment or safety, you are automatically limiting some use cases that are valid for certain people.

Replies

losvedir • today at 1:44 AM

> That's one of the big problems with any kind of AI alignment or safety work.

That's why I found this announcement interesting, with regard to its discussion of alignment. Alignment as you're talking about here centers around ethics and a moral framework and is so named because a lot of the early LLM folks were big into "artificial general intelligence" and the fear that the AI will take over the world or whatever.

But fundamentally, and at a technical level, the "alignment" step is just additional training on top of the pre-training of the gigantic corpus of text. The pre-training kind of teaches it the world model and English, and "alignment" turns it into a question and answer bot that can "think" and use tools.

In other words, there's plenty of non-controversial "alignment" improvements that can be made, and indeed the highlight of this announcement is that it's now less susceptible to prompt injection (which, yes, is alignment!). Other improvements could be how well it uses tools, follows instructions, etc.

alt Hacker News

Replies