$500 GPU outperforms Claude Sonnet on coding benchmarks

64 points • by yogthos • yesterday at 5:31 PM • 14 comments • view on HN

Comments

I’d encourage devs to use MiniMax, Kimi, etc for real world tasks that require intelligence. The down sides emerge pretty fast: much higher reasoning token use, slower outputs, and degradation that is palpable. Sadly, you do get what you pay for right now. However that doesn’t prevent you from saving tons through smart model routing, being smart about reasoning budgets, and using max output tokens wisely. And optimize your apps and prompts to reduce output tokens.

➕ show 1 reply

selcuka • today at 1:04 AM

It's a race to the bottom. DeepSeek beats all others (single-shot), and it is ~50% cheaper than the cost of local electricity only.

> DeepSeek V3.2 Reasoning 86.2% ~$0.002 API, single-shot

> ATLAS V3 (pass@1-v(k=3)) 74.6% ~$0.004 Local electricity only, best-of-3 + repair pipeline

➕ show 2 replies

memothon • yesterday at 8:58 PM

I'm always skeptical because you can make it pass the benchmarks, then you use it and it is not practically useful unlike an extremely general model.

Cool work though, really excited for the potential of slimming down models.

➕ show 1 reply

riidom • today at 12:04 AM

Not a word about the tok/sec, unfortunately.

➕ show 1 reply

superkuh • today at 1:04 AM

If anyone else was hoping this was using Q8 internally and that converted to Q4 it could fit in 12GB VRAM: unfortunately it's already at Q4_K_M (~9GB) and the the 16GB requirement is from other parts not a 14B@8bit+kv cache/etc you might guess.

negativegate • yesterday at 11:37 PM

Am I still SOL on AMD (9070 XT) when it comes to this stuff?

➕ show 2 replies

ozgurozkan • today at 1:31 AM

[dead]

sayYayToLife • today at 1:22 AM

[dead]

alt Hacker News

$500 GPU outperforms Claude Sonnet on coding benchmarks

Comments