I think this is an attempt to try to enrich the locality model in transformers.
One of the weird things you do in transformers is add a position vector which captures the distance between the token being attended to the some other token.
This is obviously not powerful enough to express non-linear relationships - like graph relationships.
This person seems to be experimenting with doing pre-processing of the input token set, to linearly reorder it by some other heuristic that might map more closely to the actual underlying relationship between each token.
> This is obviously not powerful enough to express non-linear relationships - like graph relationships.
the distance metrics used is based on energy-informed graphs that encode energy relations in a distribution called taumode, see my previous paper on spectral indexing for vector databases for a complete roll-out
Adding the position vector is basic sure, but it's naive to think the model doesn't develop its own positional system bootstrapping on top of the barebones one.
[1] https://aclanthology.org/Q16-1024/
As this model allows for mix-and-match of various contexts, one thing that I did is to have a word-sorted context. This effectively transforms position-based context into a word-set based context, where "you and me", "me and you" and "and me you" are the same.
This allowed for longer contexts and better prediction.