logoalt Hacker News

acheong08today at 4:55 PM2 repliesview on HN

Thinking along the line of speed, I wonder if a model that can reason and use tools at 60fps would be able to control a robot with raw instructions and perform skilled physical work currently limited by the text-only output of LLMs. Also helps that the Gemini series is really good at multimodal processing with images and audio. Maybe they can also encode sensory inputs in a similar way.

Pipe dream right now, but 50 years later? Maybe


Replies

incognito124today at 5:21 PM

Believe it or not, there's Gemini Robotics, which seems to be exactly what you're talking about:

https://deepmind.google/models/gemini-robotics/

Previous discussions: https://news.ycombinator.com/item?id=43344082

iamgopaltoday at 5:02 PM

Much sooner, hardware, power, software, even AI model design, inference hardware, cache, everything being improved , it's exponential.