logoalt Hacker News

Gemini Robotics-ER 1.6

94 pointsby markerbrodtoday at 2:02 PM23 commentsview on HN

Comments

sho_hntoday at 2:27 PM

It does all start to feel like we'd get fairly close to being able to convincingly emulate a lot of human or at least animal behavior on top of the existing generative stack, by using brain-like orchestration patterns ... if only inference was fast enough to do much more of it.

The gauge-reading example here is great, but in reality of course having the system synthesize that Python script, run the CV tasks, come back with the answer etc. is currently quite slow.

Once things go much faster, you can also start to use image generation to have models extrapolate possible futures from photos they take, and then describe them back to themselves and make decisions based on that, loops like this. I think the assumption is that our brains do similar things unconsciously, before we integrate into our conscious conception of mind.

I'm really curious what things we could build if we had 100x or 1000x inference throughput.

show 4 replies
vibe42today at 4:09 PM

A parcel of land.

A few robot legs and arms, big battery, off-the-shelf GPU. Solar panels.

Prompt: "Take care of all this land within its limits and grow some veggies."

show 1 reply
vessenestoday at 4:08 PM

Nice. I couldn't find the part that I'm most interested in though, latency. This beats their frontier vision model for some identification tasks -- for a robotics model, I'm interested in hz. Since this is an "Embodied Reasoning" model, I'm assuming it's fairly slow - it's designed to match with on-robot faster cycle models.

Anyway, cool.

show 1 reply
skybriantoday at 3:15 PM

Pointing a camera at a pressure gauge and recording a graph is something that I would have found useful and have thought about writing. Does software like that exist that’s available to consumers?

show 3 replies
gallerdudetoday at 2:50 PM

I’ve been thinking about AI robotics lately… if internally at labs they have a GPT-2, GPT-3 “equivalent” for robotics, you can’t really release that. If a robot unloading your dishwasher breaks one of your dishes once, this is a massive failure.

So there might be awesome progress behind the scenes, just not ready for the general public.

show 5 replies
vipipiccftoday at 4:19 PM

[dead]

jeffbeetoday at 2:47 PM

Showing the murder dog reading a gauge using $$$ worth of model time is kinda not an amazing demo. We already know how to read gauges with machine vision. We also know how to order digital gauges out of industrial catalogs for under $50.

show 2 replies