“Oh no! We opened ten LLMs, all of which have read decades’ worth of fiction on how an AI would be behave in this situation, then asked a leading question thirty times each, and on some of those runs they did the thing we were leading them on.”
do you really think this behavior is imposed on science fiction training data?
do you really think this behavior is imposed on science fiction training data?