My issue with this model is it keeps producing Chinese people and Chinese text. I have to very specifically go out of my way to say what kind of race they are.
If I say “A man”, it’s fine. A black man, no problem. It’s when I add context and instructions is just seems to want to go with some Chinese man. Which is fine, but I would like to see more variety of people it’s trained on to create more diverse images. For non-people it’s amazingly good.
All modern models have their default looks. Meaningful variety of outputs for the same inputs in finetuned models is still an open technical problem. It's not impossible, but not solved either.