logoalt Hacker News

jasonlotitoyesterday at 8:18 PM2 repliesview on HN

FTA: Both models make clear mistakes, but GPT‑5.2 shows better comprehension of the image.

You can find it right next to the image you are talking about.


Replies

tedsandersyesterday at 8:40 PM

To be fair to OP, I just added this to our blog after their comment, in response to the correct criticisms that our text didn't make it clear how bad GPT-5.2's labels are.

LLMs have always been very subhuman at vision, and GPT-5.2 continues in this tradition, but it's still a big step up over GPT-5.1.

One way to get a sense of how bad LLMs are at vision is to watch them play Pokemon. E.g.,: https://www.lesswrong.com/posts/u6Lacc7wx4yYkBQ3r/insights-i...

They still very much struggle with basic vision tasks that adults, kids, and even animals can ace with little trouble.

da_grift_shiftyesterday at 9:00 PM

'Commented after article was already edited in response to HN feedback' award