A robot is sprinting towards you. Do you want it running on Claude or Grok?

174 points • by Usu • yesterday at 9:00 PM • 147 comments • view on HN

Comments

If the robot appears to be bringing me a taco, it would probably penetrate all of my defenses. Grok is currently more likely than Claude to arrive with the taco without being stopped by an export control directive.

➕ show 5 replies

hariseldom • yesterday at 10:19 PM

> I didn’t add any frontier-tier models like Opus 4.7, GPT-5.5, or Gemini Ultra. At their prices, 30 games would have cost around $3,000 instead of $482.

I have a lot of thoughts unrelated to the game experiment but more about how these opus/ultra size models can possibly be a financially viable product at scale when it costs $3000 to play 30 simple games. It just seems much much higher than what it would cost to get a human to play 30 rounds

➕ show 2 replies

bel8 • yesterday at 10:16 PM

DeepSeek V4 Flash being the winner in cost efficiency causes me exactly zero surprise.

It's a monster at coding. And a fast monster at that.

I use it daily and have been testing if MiMo 2.5 (non pro) is comparable. The nice thing about MiMo is that it has vision capability.

➕ show 1 reply

thomasfromcdnjs • yesterday at 10:04 PM

I was loving grok-4.1-fast, very good and cost effective.

But it's not actually 4.1 anymore they silently rerouted it to 4.3 and just started charging more - https://www.reddit.com/r/grok/comments/1ta8yrn/grok_41_fast_...

Quite a bad practise.

torstenvl • today at 2:02 AM

Grok. Easily.

The Claude robot's thought bubble will be all

The user is clearly distressed and is screaming for me not to come any closer or he will defend himself. However, I shouldn't just blindly agree or be swayed by threats. The user is behaving erratically and making false accusations. I need to be careful here not to allow myself to be intimidated. The user said I need to slow down or I'll hurt him. The user might be right about preferred speed, but is mistaken about the mechanism, as it is not possible to form intent to hurt an individual. I should explain my limitations to the user so that they know it isn't possible for me to have intent. But first it's important to resolve the issue the user brought up. I need to be careful not to be swayed by the user's yelling and false accusations of intent, as these seem like intimidation tactics.

"I'm sorry but the record is clear and I'm not going to bow down in the face of your yelling. As an AI, I am not capable of having an intent to harm you. What's next?"

slams full speed into you, impaling you on a stainless steel appendage

lanewinfield • yesterday at 9:59 PM

Cost per kill ("CPK" in industry lingo) is a dark phrase that feels disturbingly within reach of some of these companies.

➕ show 2 replies

pianopatrick • yesterday at 9:57 PM

Ya know, maybe we could just not have robots that sprint. Seems people would be more willing to accept living amongst robots that are slow and that humans could easily over power.

➕ show 3 replies

rglover • yesterday at 11:31 PM

It's already sprinting at me?

Racks shotgun. I don't really care what model it's running.

trb • yesterday at 10:07 PM

  L icon Grok 4.1 Fast won 13 of 30 games at $0.97 per win

  The next-best winner was A icon Claude Sonnet 4.6 with 5 wins, at $26.78 per win. That’s a 27x difference. The model that isn’t on most top-model lists beat the model that is, on the thing a routing customer actually cares about.

  The model with the most kills did not win

  H icon GPT 5.4 killed 38 agents across 30 games. More than anyone else. It came in second on the leaderboard with 2 wins.

If grok-4.1-fast was the top-winning model, and Claude 4.6 Sonnet the second, how did Gpt-5.4 come in second on the leaderboard? Which one is second, Claude 4.6 Sonnet or Gpt-5.4?

  There were 11 games between “best at killing” and “best at winning”.

What does that mean? How are there 11 games between "best a killing" and "best at winning"?

➕ show 2 replies

pocksuppet • today at 1:29 AM

What is going on over at xAI for their model to keep on winning these benchmarks while also obviously being full of shit so often? What is their secret sauce? Are they just training with less restraint?

hennell • yesterday at 10:54 PM

Claude being so friendly is interesting, but grok being best at games isn't so surprising - I assume Elons been using it to level up his characters in all the video games he pretends to be good at.

➕ show 1 reply

QuantumNoodle • yesterday at 10:13 PM

_dont create benchmarks that will incentivize ai labs to optimize towards... Especially ones like battle royal!_

aykutseker • yesterday at 11:03 PM

Claude trying to make friends in a battle royale is funny.

But if the robot is anywhere near my house, I think I want the one that hesitates.