Given DeepSWE just blew apart the SWE-Bench Pro benchmark and handed a 14-point lead to GPT-5.5, it ...

lordmauve • today at 5:27 PM • 1 reply • view on HN

Given DeepSWE just blew apart the SWE-Bench Pro benchmark and handed a 14-point lead to GPT-5.5, it looks pretty bad that they've listed SWE-Bench first in the model release and no DeepSWE. Like, this isn't obviously an answer.

Or maybe it is, but publish the DeepSWE numbers so we can see for ourselves.

Replies

phainopepla2 • today at 6:08 PM

I'm highly skeptical of DeepSWE. It rates GPT-5.4-mini as three times better than deepseek-v4-pro, but every time I use GPT-5.4-mini I find that it completely sucks at following directions.

➕ show 3 replies

alt Hacker News

Replies