>there’s no way that’s going to happen automatically
They train their model in a pretty straightforward way, it can also be used to capture the distortion as well, just use a non-monochrome (possibly moving) background optimized for this. It's a matter of effort and attention to detail during training (uneven green screen lighting, reflections, etc), not fundamental impossibility
Yes. But the main issue is in the way they formulate the problem. Their output is always a transparency mask, which of course will never handle distortions.