... according to grok-4-1-fast-non-reasoning who was the judge, on 4 tasks in total, score was 38 to...

embedding-shape • today at 2:21 AM • 3 replies • view on HN

... according to grok-4-1-fast-non-reasoning who was the judge, on 4 tasks in total, score was 38 to 33 so obviously huge conclusions can be made.

> We ran 4 fresh text tasks, generated on the fly for this matchup so neither model could prepare in advance, and had grok-4-1-fast-non-reasoning score each one. DeepSeek: DeepSeek V4 Pro scored 38.0 to OpenAI: GPT-5.5 Pro's 33.0.

Replies

andai • today at 2:27 AM

grok-4-1-fast was retired about a month ago.

Requests to grok-4-1-fast-non-reasoning now silently route to grok-4.3 (a 5x more expensive model), with reasoning set to "none".

https://docs.x.ai/developers/migration/may-15-retirement

TFA was published today, which implies grok-4.3 was used.

➕ show 1 reply

largbae • today at 2:28 AM

Pretty small sample size here, but it's hard to avoid the conclusion that DeepSeek and friends will start to put some serious downward pressure on frontier lab token pricing.

Hopefully this dynamic continues long enough to make local/private inference the leading solution for coding.

➕ show 1 reply

ekidd • today at 2:53 AM

The OP uses tons of typical AI turns of phrase, and Pangram classified it as AI with high confidence.

So it doesn't surprise me at all that the methodology is weak, too.

alt Hacker News

Replies