The problem is not in the image models rather the training data and its context. "British museum" for MJ is the image source, "British museum" is the setting for Nano Banana.