They’re multimodal LLMs trained for image generation. Turns out that if you want to generate images ...

taneq • today at 8:28 AM • 1 reply • view on HN

They’re multimodal LLMs trained for image generation. Turns out that if you want to generate images you gotta know what things look like.

TZubiri • today at 9:27 AM

That's not helpful my brother. If you have details share them, if not, don't pretend you are more illuminated than me.

Is the image(text) function reversible? Or are they brute force searching a nearest neighbor like word2vec/hash brute forcing.

➕ show 1 reply

alt Hacker News