It’s four poorly constructed arbitrary experiments which say very little about the competency of eit...

Stitch4223 • today at 3:47 AM • 1 reply • view on HN

It’s four poorly constructed arbitrary experiments which say very little about the competency of either model.

The article reads like thin, auto-generated ai clickbait for nerd sniping or shilling a model.

Consider the lead:

> DeepSeek V4 Pro wins this head-to-head by being more exact where it matters: following instructions, matching schemas, and solving edge cases cleanly. GPT-5.5 Pro is still strong, but it gave away points with avoidable deviations.

“where it matters”, “cleanly”, “is still strong”, and vague references instead of telling 3 out of 4 tests Deepseek yielded more concise results.

1 star.

Replies

jampekka • today at 8:56 AM

(Three out of) four experiments is anecdotal for sure, but the result meshes with more established instruction following benchmarking (although DeepSeek V4 pro does not top these): https://artificialanalysis.ai/evaluations/ifbench

I found the writing clear and quite even handed. The lead is a bit salesy, but leads typically are. Knee-jerk dismissals based on vibes that something is LLM generated are quite low-effort.

➕ show 1 reply

alt Hacker News

Replies