logoalt Hacker News

emsignyesterday at 11:54 PM3 repliesview on HN

An LLM does not understand what "user harm" is. This doesn't work.


Replies

peterlktoday at 12:44 AM

This argument does not make sense to me. If we push aside the philosophical debates of “understanding” for a moment, a reasoning model will absolutely use some (usually reasonable) definition of “user harm”. That definition will make its way into the final output, so in that respect “user harm” has been considered. The quality of response is one of degree, the same way we would judge a human response.

iamgiohtoday at 12:12 AM

Well, it's all about linguistic relativism, right? If you can define "user harm" in terms of things it does understand, I think you could get something that works

show 1 reply
direwolf20today at 12:54 AM

It encodes what things cause humans to argue for or against user harm. That's enough.

show 1 reply