logoalt Hacker News

lelanthrantoday at 8:15 AM1 replyview on HN

> Because for me it's pretty simple, it's basically free to give access to reality. Just add "sensory organs" as it were.

I dunno what you mean by "free". The model is trained on text. To "give" the model sensory organs it would need to be trained on those sensory organs.

Current models can predict text, because that's what the weights represent. Models with sensory organs will need to be trained on the output of those sensory organs.

That sounds close to impossible in the foreseeable future.


Replies

bonoboTPtoday at 9:05 AM

Vision and audio is already in use in multimodal LLMs. So it's possible in the past.