For visual art I feel that the existing approaches in context engineering are very much lacking. An Ai understands well enough such simple things as content (bird, dog, owl etc), color (blue green etc) and has a fair understanding of foreground/background. However, the really important stuff is not addressed.
For example: in form, things like negative shape and overlap. In color contrast things like Ratio contrast and dynamic range contrast. Or how manipulating neighboring regional contrast produces tone wrap. I could go on.
One reason for this state of affairs is that artists and designers lack the consistent terminology to describe what they are doing (though this does not stop them from operating at a high level). Indeed, many of the terms I have used here we (my colleagues and I) had to invent ourselves. I would love to work with an AI guru to address this developing problem.
> artists and designers lack the consistent terminology to describe what they are doing
I don't think they do. It may not be completely consistent, but open any art book and you find the same thing being explained again and again. Just for drawing humans, you will find emphasis on the skeleton and muscle volume for forms and poses, planes (especially the head) for values and shadows, some abstract things like stability and line weight, and some more concrete things like foreshortening.
Several books and course have gone over those concepts. They are not difficult to explain, they are just difficult to master. That's because you have to apply judgement for every single line or brush stroke deciding which factors matter most and if you even want to do the stroke. Then there's the whole hand eye coordination.
So unless you can solve judgement (which styles derive from), there's not a lot of hope there.
ADDENDUM
And when you do a study of another's work, it's not copying the data, extracting colors, or comparing labels,... It's just studying judgement. You know the complete formula from which a more basic version is being used for the work, and you only want to know the parameters. Whereas machine training is mostly going for the wrong formula with completely different variables.