logoalt Hacker News

jimmytucsonyesterday at 10:48 PM1 replyview on HN

Pretty much mirrors my experience using GPT to generate images creatively. I tried to generate an image to accompany a Robert frost poem and it made something... plausibly related. But not what I was describing. I spent the next 90% of the time making it 10% closer to what I wanted but it never got all the way there.

I’ve given it different levels of open-endednes, give this flow chart an aesthetic like this mechanical keyboard, or generate an SVG of this graphic from a 70s slide show, but it never looks quite like what I have in mind.

In the end, I think you only use this stuff to generate images if you’re prepared to accept whatever comes out on approximately the first try.


Replies

TheOtherHobbestoday at 12:04 AM

This isn't a solvable problem without world models. Tokenised prompting is like stabbing a pin at a huge target in the dark. Sometimes something interesting falls out, but latent space doesn't have the definition to give most people exactly what they want.

When it does, it's more likely to be something popular and unoriginal, where the data is dense, and less likely to be something inventive and strange.

show 1 reply