This reinforces my suspicion that alignment and training in general is closer to being a pedagogical problem than anything else. Given a finite amount of training input, how do we elicit the desired model behavior? I’m not sure if asking educators is the right answer, but it’s one place to start.
[flagged]
[dead]
Side note: Anthropic has done well at achieving an immediately-recognizable art style.