logoalt Hacker News

djoldmanyesterday at 6:40 PM6 repliesview on HN

I'm genuinely asking (not trying to be snarky)... Why are these robots so slow?

Is it a throughput constraint given too much data from the environment sensors?

Is it processing the data?

I'm curious about where the bottleneck is.


Replies

ajhaitoday at 1:18 AM

It is inference latency most of the time. These VLA models take in an image + state + text and spit out a set of joint angle deltas.

Depending on the model being used, we may get just one set of joint angle deltas or a series of them. In order to be able to complete a task, it will need to capture images from the cameras, current joint angles and send them to the model along with the task text to get the joint angle changes we will need to apply. Once the joint angles are updated, we will need to check if the task is complete (this can come from the model too). We run this loop till the task is complete.

Combine this with the motion planning that has to happen to make sure the joint angles we are getting do not result in colliding with the surroundings and are safe, results in overall slowness.

michaeltyesterday at 8:10 PM

When you're operating your robot around humans, you want to be very confident it won't injure anyone. It'd be pretty bad if a bug in your code meant instead of putting the cast iron frying pan in the dishwasher, it sent it flying across the room.

One way of doing that is to write code with no bugs or unpredictable behaviour, a nigh-impossible feat - especially once you've got ML models in the mix.

Another option is to put a guard cage around your robot so nobody can enter pan-throwing distance without deactivating the robot first. But obviously that's not practical in a home environment.

Another option is just to go slowly all the time. The pan won't fly very far if the robot only moves 6 inches per second.

show 1 reply
robopolicyyesterday at 6:49 PM

Part of it is that training of these VLAs currently happens on human teleop data which limits speed (both for safety reasons and because of actual physical speed constraints in the teleoperation pipeline).

Let’s see how it changes once these pipelines follow the LLM recipes to use more than just human data…

ethan_smithyesterday at 8:44 PM

The primary bottleneck is typically the motion planning system that must continuously solve complex optimization problems to ensure safe trajectories while avoiding collisions in dynamic environments.

show 1 reply
dheerayesterday at 7:33 PM

Not a PI employee, but diffusion policies are like diffusion models for image generation, they generate actions from noise in multiple steps. With current compute you can't run 100+Hz control loops with that kind of architecture.

Some combination of distillation, new architectures, faster compute, can eventually attack these problems. Historically as long as something in tech has been shown to be possible, speed has almost always been a non-issue in the years afterwards.

For now even getting a robot to understand what to do in the physical world is a major leap from before.

show 2 replies