Visual presentation has been a weak point of AI generation for me. There isn't a lot of support for them seeing how a potential presentation might appear to a human.
Models that take visual input seem more focused on identifying what is in the image compared to what a human might perceive is in an image, and most interfaces lack any form of automated feedback mechanism for them to look at what it has made.
In short, I have made some fun things with AI but I still end up doing CSS by hand.