logoalt Hacker News

fennecbutt12/18/20240 repliesview on HN

This is correct and even image generation models aren't really trained for comprehension of image composition yet.

Even the models based off danbooru and E621 still aren't the best at that. And us furries like to tag art in detail.

The best we can really do at the moment is regional prompting, perhaps they need something similar for video.