logoalt Hacker News

nharadatoday at 1:11 AM1 replyview on HN

I think the assumption is valid. Most of the reasoning components of the next gen (and some current gen) robotics will use VLMs to some extent. Deciding if a temporary construction sign is valid seems to fall under this use case.


Replies

theamktoday at 5:12 AM

But unless you are using a single, end-to-end model for the entire driving stack, that "proceed" command will never influence accelerator pedal.

Sure, there will be a VLM for reading the signs, but the worst it'd be able to output is things like "there is a "detour" sign at (123, 456) pointing to road #987" - and some other, likley non-LLM, mechanism will ensure that following that road is actually safe.