I don't see this being such a big gap. There are some use-cases for sure but apart from UX/UI work it is not really needed. Besides, none of the frontier models can replicate actual images - the can approximate at least in my own experience.
Using llms to generate docx. Being able to rasterize and review is an important part of the process.
One of my tests for a new model is dumping in a screenshot of a web page and seeing if it can recreate it from scratch in HTML and CSS.
Even the local models I run on my Mac are getting surprisingly good at that now.