logoalt Hacker News

zozbot234today at 9:04 AM0 repliesview on HN

It's picking strange tasks that don't really play to GPT-Pro's strengths (that model is roughly comparable to Mythos, intended for very hard reasoning and research-level problems) and then completely ignoring quite a few cases where GPT-Pro actually got some things more correct than DeepSeek did. The auto-AI ranking is just not reliable for this stuff.