It's picking strange tasks that don't really play to GPT-Pro's strengths (that model ...

zozbot234 • today at 9:04 AM • 0 replies • view on HN

It's picking strange tasks that don't really play to GPT-Pro's strengths (that model is roughly comparable to Mythos, intended for very hard reasoning and research-level problems) and then completely ignoring quite a few cases where GPT-Pro actually got some things more correct than DeepSeek did. The auto-AI ranking is just not reliable for this stuff.

alt Hacker News