logoalt Hacker News

xigoitoday at 6:30 AM4 repliesview on HN

The standard objection: if the LLM is supposedly intelligent, why can’t it figure out on its own that this two-step process would achieve a better result?


Replies

jstanleytoday at 6:43 AM

[flagged]

show 1 reply
pyrolisticaltoday at 7:32 AM

You don’t know what you don’t know

nine_ktoday at 6:32 AM

Nobody asked it to!

show 1 reply
cubefoxtoday at 6:41 AM

Part of the problem is that it isn't the LLM making the image directly itself, it's the LLM repeatedly prompting edits for a separate edit diffusion model. The Gemini reasoning summary shows part of this. The style of some of the images makes it also clear that it uses an Imagen 4 derived diffusion model underneath.