logoalt Hacker News

embedding-shapetoday at 2:21 AM3 repliesview on HN

... according to grok-4-1-fast-non-reasoning who was the judge, on 4 tasks in total, score was 38 to 33 so obviously huge conclusions can be made.

> We ran 4 fresh text tasks, generated on the fly for this matchup so neither model could prepare in advance, and had grok-4-1-fast-non-reasoning score each one. DeepSeek: DeepSeek V4 Pro scored 38.0 to OpenAI: GPT-5.5 Pro's 33.0.


Replies

andaitoday at 2:27 AM

grok-4-1-fast was retired about a month ago.

Requests to grok-4-1-fast-non-reasoning now silently route to grok-4.3 (a 5x more expensive model), with reasoning set to "none".

https://docs.x.ai/developers/migration/may-15-retirement

TFA was published today, which implies grok-4.3 was used.

show 1 reply
largbaetoday at 2:28 AM

Pretty small sample size here, but it's hard to avoid the conclusion that DeepSeek and friends will start to put some serious downward pressure on frontier lab token pricing.

Hopefully this dynamic continues long enough to make local/private inference the leading solution for coding.

show 1 reply
ekiddtoday at 2:53 AM

The OP uses tons of typical AI turns of phrase, and Pangram classified it as AI with high confidence.

So it doesn't surprise me at all that the methodology is weak, too.