logoalt Hacker News

vunderbayesterday at 9:17 PM2 repliesview on HN

Anything that needs to overcome concepts which are disproportionately represented in the training data is going to give these models a hard time.

Try generating:

- A spider missing one leg

- A 9-pointed star

- A 5-leaf clover

- A man with six fingers on his left hand and four fingers on his right

You'll be lucky to get a 25% success rate.

The last one is particularly ironic given how much work went into FIXING the old SD 1.5 issues with hand anatomy... to the point where I'm seriously considering incorporating it as a new test scenario on GenAI Showdown.


Replies

XenophileJKOyesterday at 11:12 PM

It mostly depends on "how" the models work. Multi-modal unified text/image sequence to sequence models can do this pretty well, diffusion doesn't.

moonuyesterday at 9:45 PM

https://gemini.google.com/share/8cef4b408a0a

Surprisingly, it got all of them right

show 1 reply