>let alone understand nuance or sarcasm
I'm still trying to find humans that do this reliably too.
To add on, 5.2 seems to be kind of lazy when reading text in images by default. Feeding it an image it may give the first word or so. But coming back with a prompt 'read all the text in the image' makes it do a better job.
With one in particular that I tested I thought it was hallucinating some of the words, but there was a picture in the picture with small words it saw I missed the first time.
I think a lot of AI capabilities are kind of munged to end users because they limit how much GPU is used.
[dead]