LLMs are really bad at anything visual, as demonstrated by pelicans riding bicycles, or Claude Plays Pokémon.
Opus would probably do better though.
How could they be any good at visuals? They are trained on text after all.
How could they be any good at visuals? They are trained on text after all.