An LLM does not understand what "user harm" is. This doesn't work.

emsign • yesterday at 11:54 PM • 3 replies • view on HN

Replies

This argument does not make sense to me. If we push aside the philosophical debates of “understanding” for a moment, a reasoning model will absolutely use some (usually reasonable) definition of “user harm”. That definition will make its way into the final output, so in that respect “user harm” has been considered. The quality of response is one of degree, the same way we would judge a human response.

iamgioh • today at 12:12 AM

Well, it's all about linguistic relativism, right? If you can define "user harm" in terms of things it does understand, I think you could get something that works

➕ show 1 reply

direwolf20 • today at 12:54 AM

It encodes what things cause humans to argue for or against user harm. That's enough.

➕ show 1 reply

alt Hacker News

Replies