But if you squint then sensory actions and reactions are also sequential tokens. Even reactions can be encoded alongside input as action tokens and as single token stream. Anyone tried sth like this?
> But if you squint then sensory actions and reactions are also sequential tokens
I'm not sure you could model it that way.
Animal brains don't necessarily just react to sensory input, they frequently have already predicted the next state based on previous state and learning/experience, and not just in a simple sequential manner but at many different levels of patterns involved simultaneously (local immed action vs actions part of larger structure of behavior), etc.
Sensory input is compared to predicted state and differences are incorporated into the flow.
The key thing is our brains are modeling and simulating the world around us and it's future state (modeling the physical world as well as the abstract world of what other animals are thinking). It's not clear that LLM's are doing that (my assumption is that they are not doing any of that, and until we build systems that do that, we won't be moving towards the kind of flexible and adaptable control our brains have).
Edit: I just read the rest of the parent post that said basically the same thing, was skimming so missed it.
> But if you squint then sensory actions and reactions are also sequential tokens
I'm not sure you could model it that way.
Animal brains don't necessarily just react to sensory input, they frequently have already predicted the next state based on previous state and learning/experience, and not just in a simple sequential manner but at many different levels of patterns involved simultaneously (local immed action vs actions part of larger structure of behavior), etc.
Sensory input is compared to predicted state and differences are incorporated into the flow.
The key thing is our brains are modeling and simulating the world around us and it's future state (modeling the physical world as well as the abstract world of what other animals are thinking). It's not clear that LLM's are doing that (my assumption is that they are not doing any of that, and until we build systems that do that, we won't be moving towards the kind of flexible and adaptable control our brains have).
Edit: I just read the rest of the parent post that said basically the same thing, was skimming so missed it.