There's already techniques for controlling AI generated images. There's ControlNet for Stable Diffusion and there are already techniques to take existing footage and style-morphing it with AI. For larger budget productions I would anticipate video production tooling to arise where directors and animators have fine grained influence and control over the wireframes within a 3D scene to directly prevent or fix issues like clipping, volumetric changes, visual consistency, text generation, gravity, etc. Or even just them recording and producing their video in a lower budget format and then having it re-rendered with AI to set the style or mood but adhering to scene layout, perspective, timing, cuts, etc. Not just for mitigating AI errors but also just for controlling their vision of the final product.
Or they could simply brute force it by clipping the scene at the problem point and have it try, try again with another re-render iteration from that point until it's no longer problematic. Or just do the bulk of the work with AI and do video inpainting for small areas to fix or reserve the human CGI artists for fixing unmitigatable problems that crop up if they're fixable without full re-rendering (whichever probably ends up less expensive).
Plus with what we've recently seen with world models that have been released in the last week or so, AI will soon get better at having a full and accurate representation of the world it creates and future generations of this technology beyond what Sora is doing simply won't make these mistakes.