When humans, or dogs or cats for that matter, react to novel situations they encounter, when they appear to generalize or synthesize prior diverse experience into a novel reaction, that new experience and new reaction feeds directly back into their mental model and alters it on the fly. It doesn't just tack on a new memory. New experience and new information back-propagates constantly adjusting the weights and meanings of prior memories. This is a more multi-dimensional alteration than simply re-training a model to come up with a new right answer... it also exposes to the human mental model all the potential flaws in all the previous answers which may have been sufficiently correct before.
This is why, for example, a 30 year old can lose control of a car on an icy road and then suddenly, in the span of half a second before crashing, remember a time they intentionally drifted a car on the street when they were 16 and reflect on how stupid they were. In the human or animal mental model, all events are recalled by other things, and all are constantly adapting, even adapting past things.
The tokens we take in and process are not words, nor spatial artifacts. We read a whole model as a token, and our output is a vector of weighted models that we somewhat trust and somewhat discard. Meeting a new person, you will compare all their apparent models to the ones you know: Facial models, audio models, language models, political models. You ingest their vector of models as tokens and attempt to compare them to your own existing ones, while updating yours at the same time. Only once our thoughts have arranged those competing models we hold in some kind of hierarchy do we poll those models for which ones are appropriate to synthesize words or actions from.