Interesting "ScreenSpot Pro" results: 72.7% Gemini 3 Pro 11.4% Gemini...

djoldman • yesterday at 7:18 PM • 3 replies • view on HN

Interesting "ScreenSpot Pro" results:

    72.7% Gemini 3 Pro
    11.4% Gemini 2.5 Pro
    49.9% Claude Opus 4.5
    3.50% GPT-5.1

ScreenSpot-Pro: GUI Grounding for Professional High-Resolution Computer Use

https://arxiv.org/abs/2504.07981

Replies

simonw • yesterday at 8:12 PM

I was surprised at how poorly GPT-5 did in comparison to Opus 4.1 and Gemini 2.5 on a pretty simple OCR task a few months ago - I should run that again against the latest models and see how they do. https://simonwillison.net/2025/Aug/29/the-perils-of-vibe-cod...

➕ show 1 reply

jasonjmcghee • yesterday at 8:07 PM

That is... astronomically different. Is GPT-5.1 downscaling and losing critical information or something? How could it be so different?

➕ show 3 replies

agentifysh • yesterday at 7:29 PM

impressive.....most impressive

its going to reach low 90s very soon if trends continue

alt Hacker News

Replies