Thinking along the line of speed, I wonder if a model that can reason and use tools at 60fps would b...

acheong08 • today at 4:55 PM • 2 replies • view on HN

Thinking along the line of speed, I wonder if a model that can reason and use tools at 60fps would be able to control a robot with raw instructions and perform skilled physical work currently limited by the text-only output of LLMs. Also helps that the Gemini series is really good at multimodal processing with images and audio. Maybe they can also encode sensory inputs in a similar way.

Pipe dream right now, but 50 years later? Maybe

Replies

incognito124 • today at 5:21 PM

Believe it or not, there's Gemini Robotics, which seems to be exactly what you're talking about:

https://deepmind.google/models/gemini-robotics/

Previous discussions: https://news.ycombinator.com/item?id=43344082

iamgopal • today at 5:02 PM

Much sooner, hardware, power, software, even AI model design, inference hardware, cache, everything being improved , it's exponential.

alt Hacker News

Replies