logoalt Hacker News

Gemini Robotics On-Device brings AI to local robotic devices

216 pointsby meetpateltech06/24/202591 commentsview on HN

Comments

jagger2706/24/2025

These are going to be war machines, make absolutely no mistake about it. On-device autonomy is the perfect foil to escape centralized authority and accountability. There’s no human behind the drone to charge for war crimes. It’s what they’ve always dreamed of.

Who’s going to stop them? Who’s going to say no? The military contracts are too big to say no to, and they might not have a choice.

The elimination of toil will mean the elimination of humans all together. That’s where we’re headed. There will be no profitable life left for you, and you will be liquidated by “AI-Powered Automation for Every Decision”[0]. Every. Decision. It’s so transparent. The optimists in this thread are baffling.

0: https://www.palantir.com/

show 4 replies
baron81606/24/2025

I’m optimistic about humanoid robotics, but I’m curious about the reliability issue. Biological limbs and hands are quite miraculous when you consider that they are able to constantly interact with the world, which entails some natural wear and tear, but then constantly heal themselves.

show 5 replies
Toritori1206/24/2025

Does Anyone know how easy is to join the "trusted tester program" and if they offer modules that you can easily plug-in to run the sdk?

show 1 reply
martythemaniak06/24/2025

I've spent the last few months looking into VLAs and I'm convinced that they're gonna be a big deal, ie they very well might be the "chatgpt moment for robotics" that everyone's been anticipating. Multimodal LLMs already have a ton of built-in understanding of images and text, so VLAs are just regular MMLLMs that are fine-tuned to output a specific sequence of instructions that can be fed to a robot.

OpenVLA, which came out last year, is a Llama2 fine tune with extra image encoding that outputs a 7-tuple of integers. The integers are rotation and translation inputs for a robot arm. If you give a vision llama2 a picture of a an apple and a bowl and say "put the apple in the bowl", it already understands apples, bowls, knows the end state should apple in bowl etc. What missing is a series of tuples that will correctly manipulate the arm to do that, and the way they did it is through a large number of short instruction videos.

The neat part is that although everyone is focusing on robot arms manipulating objects at the moment, there's no reason this method can't be applied to any task. Want a smart lawnmower? It already understands "lawn" "mow", "don't destroy toy in path" etc, just needs a finetune on how to corectly operate a lawnmower. Sam Altman made some comments about having self-driving technology recently and I'm certain it's a chat-gpt based VLA. After all, if you give chatgpt a picture of a street, it knows what's a car, pedestrian, etc. It doesn't know how to output the correct turn/go/stop commands, and it does need a great deal of diverse data, but there's no reason why it can't do it. https://www.reddit.com/r/SelfDrivingCars/comments/1le7iq4/sa...

Anyway, super exciting stuff. If I had time, I'd rig a snowblower with a remote control setup, record a bunch of runs and get a VLA to clean my driveway while I sleep.

show 3 replies
suyash06/24/2025

What sort of hardware does the SDK runs on, can it run on a modern Raspberry Pi ?

show 3 replies
moelf06/24/2025

The MuJoCo link actually points to https://github.com/google-deepmind/aloha_sim

show 1 reply
TZubiri06/25/2025

Nice. I work with some students younger than 13, so most cloud and llms are quite tricky to work with, local only models like vertex are nice for this use case. I will try this as a replacement for chatgpt as Computer Vision in robotics like Lego Mindstorm

zzzeek06/24/2025

THANK YOU.

Please make robots. LLMs should be put to work for *manual* tasks, not art/creative/intellectual tasks. The goal is to improve humanity. not put us to work putting screws inside of iphones

(five years later)

what do you mean you are using a robot for your drummer

polskibus06/24/2025

What is the model architecture? I'm assuming it's far away from LLMs, but I'm curious about knowing more. Can anyone provide links that describe architectures for VLA?

show 1 reply
Workaccount206/24/2025

I continued to be impressed how Google stealth releases fairly groundbreaking products, and then (usually) just kind of forgets about them.

Rather than advertising blitz and flashy press events, they just do blog posts that tech heads circulate, forget about, and then wonder 3-4 years later "whatever happened to that?"

This looks awesome. I look forward to someone else building a start-up on this and turning it into a great product.

show 1 reply
sajithdilshan06/24/2025

I wonder what kind of guardrails (like Three Laws of Robotics) there are to prevent the robots going crazy while executing the prompts

show 5 replies
antonkar06/25/2025

The only way to prevent robots from being jailbroken and set to rob banks is to move GPUs to private SOTA secure GPU clouds

san192706/25/2025

meanwhile i will drink a coffee while it loads a reply from the API

MidoriGlow06/25/2025

Elon Musk said in last week’s Starship Update: the very first Mars missions are planned to be flown by Optimus humanoid robots to scout and build basic infrastructure before humans arrive (full transcript + audio: https://transpocket.com/share/oUKhep6cUl3s/). If Gemini Robotics On-Device can truly adapt to new tasks with ~50–100 demos, pairing that with mass-produced Optimus bodies and Starship’s lift capacity could be powerful—offline autonomy, zero-latency control, and the ability to ship dozens of robots per launch.

show 1 reply
suninsight06/24/2025

This will not end well.