The issue with composition is only a problem when you rely on a pure text prompt, but has been solved for quite a while by ControlNets or img2img. What was lacking was the integration with existing art tools, but even that is getting solved, e.g. Krita[1] has a pretty nice AI plugin.
3D can be a useful intermediate when editing the 2D image, e.g. Krea has support for that[2]. But I don't think the rest of the traditional 3D pipeline is of much use here, AI image generation already produces images at a quality that traditional rendering just can't keep up with, neither in terms of speed, quality or flexibility.
But not consistent state. The pipeline still needs to exist because most games require objects and environments to stay consistent across play sessions. That means generating from a 3D skeleton, at the very least, if not relegating genAI to production, not runtime.
Wow, those look impressive. But I think we are saying the same thing - stable diffusion can make pretty pics, but needs a lot of handholding context. I too have played around with ComfyUI, and while there are a LOT of techniques that allow you to manipulate the image, I have always felt like you were fighting SD.
In the videos you've attached, both tools (esp) the first, look impressive, but in the first example, you can clearly see that the model regenerates the street around the chameleon, when the artist changes it for no good reason.
In the second example you can see there's a bunch of AI tools under the hood, and they don't work together particularly well, with the car constantly changing as the image changes.
I think while a lot of mileage can be extracted from SD as it stands (I could think of a bunch of improvements to what was demonstrated here, by applying existing techniques ) - but the fundamental issue remains, in that Stable Diffusion was made to generate whole images at once - unlike transformers, which output a single token.
Not sure what's the image equivalent of a token is, but I'm sure it'd be feasible to train a model to fill holes - which'd be created by Segment Anything or something similar, and it would react better to local edits.