logoalt Hacker News

TeMPOraLtoday at 8:23 AM0 repliesview on HN

It's novel if you never played with img2img, including especially several forms of (text+img)2img. Or, if you never tried editing images by text prompt in recent multimodal LLMs.

That said, I spent plenty of time doing both, and yet it would probably take me a while to arrive at this approach. For some reason, the "draw a sketch, have a model flesh it out" approach got bucketed with Stable Diffusion in my mind, and multimodal LLMs with "take detailed content, make targeted edits to it". So I'm glad the OP posted it.