logoalt Hacker News

plaidfujitoday at 1:46 AM0 repliesview on HN

> The first requirement is that the computer program has a body (either physical or virtual) and sense organs

Ok, deploy a local model on a lightweight edge compute device and strap it to a chassis with wheels, and attach a cheap webcam

> Then I’d want to see an embodied agent that could navigate its environment in order to survive as well as, say, a lizard can

Give the robot appendages that enable it to plug itself into a standard wall outlet, guided by a vision model plugged into its webcam. As long as it can feed itself, it can survive long enough.

> Next I would want to see an embodied agent with the same capacity to deal with novel situations as a mouse.

I think if you fed frames from the webcam into a local VLM every 5s you’d be able to assess a situation and respond with simple actions (turn, advance, retreat).

> After that I’d want to see agents whose social dynamics are as complex as those of wolves, and then agents with the tool-making abilities of chimpanzees.

Social dynamics could be implemented in many ways, maybe by transmitting tokens over RF? Idk. Then you have a scanner that picks them up, feeds them into some LLM frontend and decides whether to add them to a global context file that guides the VLM action-taker. A new action could be to broadcast a token message. Tool-making would have to be code-based. Physical tools are hard. Still unsolved.

> At that point I would want to see people successfully teaching such embodied agents how to communicate their desires

This part is relatively straightforward except for the “via nonlinguistic modality”.

Anyway. These are all engineering problems. Personally I would demand to see the AI reproduce its body under its own power and volition. That’s a pretty neat trick we’ve got going for us.