logoalt Hacker News

voxleoneyesterday at 5:51 PM8 repliesview on HN

I'd say with confidence: we're living in the early days. AI has made jaw-dropping progress in two major domains: language and vision. With large language models (LLMs) like GPT-4 and Claude, and vision models like CLIP and DALL·E, we've seen machines that can generate poetry, write code, describe photos, and even hold eerily humanlike conversations.

But as impressive as this is, it’s easy to lose sight of the bigger picture: we’ve only scratched the surface of what artificial intelligence could be — because we’ve only scaled two modalities: text and images.

That’s like saying we’ve modeled human intelligence by mastering reading and eyesight, while ignoring touch, taste, smell, motion, memory, emotion, and everything else that makes our cognition rich, embodied, and contextual.

Human intelligence is multimodal. We make sense of the world through:

Touch (the texture of a surface, the feedback of pressure, the warmth of skin0; Smell and taste (deeply tied to memory, danger, pleasure, and even creativity); Proprioception (the sense of where your body is in space — how you move and balance); Emotional and internal states (hunger, pain, comfort, fear, motivation).

None of these are captured by current LLMs or vision transformers. Not even close. And yet, our cognitive lives depend on them.

Language and vision are just the beginning — the parts we were able to digitize first - not necessarily the most central to intelligence.

The real frontier of AI lies in the messy, rich, sensory world where people live. We’ll need new hardware (sensors), new data representations (beyond tokens), and new ways to train models that grow understanding from experience, not just patterns.


Replies

dinfinityyesterday at 6:39 PM

> Language and vision are just the beginning — the parts we were able to digitize first - not necessarily the most central to intelligence.

I respectfully disagree. Touch gives pretty cool skills, but language, video and audio are all that are needed for all online interactions. We use touch for typing and pointing, but that is only because we don't have a more efficient and effective interface.

Now I'm not saying that all other senses are uninteresting. Integrating touch, extensive proprioception, and olfaction is going to unlock a lot of 'real world' behavior, but your comment was specifically about intelligence.

Compare humans to apes and other animals and the thing that sets us apart is definitely not in the 'remaining' senses, but firmly in the realm of audio, video and language.

show 2 replies
slashdavetoday at 5:59 AM

> modeled human intelligence

That's not what these models do

mr_worldyesterday at 8:33 PM

Organic adaption and persistence of memory I would say are the two major advancements that need to happen.

Human neural networks are dynamic, they change and rearrange, grow and sever. An LLM is fixed and relies on context, if you give it the right answer it won't "learn" that is the correct answer unless it is fed back into the system and trained over months. What if it's only the right answer for a limited period of time?

To build an intelligent machine, it must be able train itself in real time and remember.

show 1 reply
chasd00yesterday at 7:00 PM

> Language and vision are just the beginning..

Based on the architectures we have they may also be the ending. There’s been a lot of news in the past couple years about LLMs but has there been any breakthroughs making headlines anywhere else in AI?

show 3 replies
Swizecyesterday at 5:59 PM

> The real frontier of AI lies in the messy, rich, sensory world where people live. We’ll need new hardware (sensors), new data representations (beyond tokens), and new ways to train models that grow understanding from experience, not just patterns.

Like Dr. Who said: DALEKs aren't brains in a machine, they are the machine!

Same is true for humans. We really are the whole body, we're not just driving it around.

show 1 reply
skydhashyesterday at 6:09 PM

Yeah, but are there new ideas or only wishes?

show 1 reply
timewizardtoday at 12:24 AM

> has made jaw-dropping progress

They took 1970s dead tech and deployed it on machines 1 million times more powerful. I'm not sure I'd qualify this as progress. I'd also need an explanation as to what systemic improvements in models and computations that give an exponential growth in performance are planned.

I don't see anything.

show 2 replies