I think the biggest issue with stable diffusion based approaches has always been poor compositional ability (putting stuff where you want), and compounding anatomical/spatial errors that gave the images an offputting vibe.
All these problems are trivially solvable (solved) using traditonal 3d meshes and techniques.
I have tried the model, and I agree with you on the point. A product was uploaded for a test, the output catches the product quite well, but the text on the generated 3D model is unreadable.
The issue with composition is only a problem when you rely on a pure text prompt, but has been solved for quite a while by ControlNets or img2img. What was lacking was the integration with existing art tools, but even that is getting solved, e.g. Krita[1] has a pretty nice AI plugin.
3D can be a useful intermediate when editing the 2D image, e.g. Krea has support for that[2]. But I don't think the rest of the traditional 3D pipeline is of much use here, AI image generation already produces images at a quality that traditional rendering just can't keep up with, neither in terms of speed, quality or flexibility.
[1] https://www.youtube.com/watch?v=PPxOE9YH57E
[2] https://www.youtube.com/watch?v=0ER5qfoJXd0