I think the biggest issue with stable diffusion based approaches has always been poor compositional ...

torginus • 01/22/2025 • 2 replies • view on HN

I think the biggest issue with stable diffusion based approaches has always been poor compositional ability (putting stuff where you want), and compounding anatomical/spatial errors that gave the images an offputting vibe.

All these problems are trivially solvable (solved) using traditonal 3d meshes and techniques.

Replies

grumbel • 01/22/2025

The issue with composition is only a problem when you rely on a pure text prompt, but has been solved for quite a while by ControlNets or img2img. What was lacking was the integration with existing art tools, but even that is getting solved, e.g. Krita[1] has a pretty nice AI plugin.

3D can be a useful intermediate when editing the 2D image, e.g. Krea has support for that[2]. But I don't think the rest of the traditional 3D pipeline is of much use here, AI image generation already produces images at a quality that traditional rendering just can't keep up with, neither in terms of speed, quality or flexibility.

[1] https://www.youtube.com/watch?v=PPxOE9YH57E

[2] https://www.youtube.com/watch?v=0ER5qfoJXd0

➕ show 2 replies

steinhafen • 01/22/2025

I have tried the model, and I agree with you on the point. A product was uploaded for a test, the output catches the product quite well, but the text on the generated 3D model is unreadable.

alt Hacker News

Replies