It is interesting what the nbp model takes away from the prompt, though
Eg instead of focusing on the artist, it focuses on the location
This makes sense! I imagine it was trained in some sort of rlvr like way where you give it a prompt and then interrogate "does this image ..." (where each question examines a different aspect of the prompt)
It's obviously an incredible model. I think there's a limit to how useful another article praising it is in contrast with one expressing frustration
I would also welcome someone writing a short takedown where they fix the prompts and get better-than-2022 results from nbp