This reinforces my suspicion that alignment and training in general is closer to being a pedagogical problem than anything else. Given a finite amount of training input, how do we elicit the desired model behavior? I’m not sure if asking educators is the right answer, but it’s one place to start.
inb4 there will be a whole new field of research that is basically psychology / pedagogy for AI. Who will be the Sigmund Freud of AI?