E.g If the driver brakes because they saw a pothole, and Lidar captures someone biking 200m away on their own path, it may mistakenly put more weight on brake causation to the 200m away object (because large moving object) vs the pothole.
I'm exaggerating, but I hope you get the point. It isn't even conflicting sensor signals about the pothole, but conflicting information about the causation. With vision only there is no conflict for the training data. This was my Aha moment. Multiple Sensors are absolutely important for fallback and extra safety, but screws up training that are based on Human Drivers
I think Elon himself doesn't understand this and hence can't articulate it, while just repeating whatever his ML engineer has said.