logoalt Hacker News

paytonjjonestoday at 2:35 AM0 repliesview on HN

Exposure to horrors doesn't imply capability or desire to commit said horrors. But it does seem like kind of a prerequisite.

All else being equal, I think I'd prefer my models to be naive about human degradation and torture, for instance. Exceptions made for specialized models used for police work etc.

I do think broader alignment is necessary either way but that seems like an extra guardrail it'd be nice to have.