I think this is an attempt to try to enrich the locality model in transformers. One of the weird t...

kannanvijayan • yesterday at 1:17 PM • 3 replies • view on HN

I think this is an attempt to try to enrich the locality model in transformers.

One of the weird things you do in transformers is add a position vector which captures the distance between the token being attended to the some other token.

This is obviously not powerful enough to express non-linear relationships - like graph relationships.

This person seems to be experimenting with doing pre-processing of the input token set, to linearly reorder it by some other heuristic that might map more closely to the actual underlying relationship between each token.

Replies

thesz • yesterday at 10:54 PM

  > like graph relationships

Once upon a time during me being language modeling researcher I built and finetuned a big (at the time - about 5 billions parameters) Sparse Non-Negative Matrix Language Model [1].

[1] https://aclanthology.org/Q16-1024/

As this model allows for mix-and-match of various contexts, one thing that I did is to have a word-sorted context. This effectively transforms position-based context into a word-set based context, where "you and me", "me and you" and "and me you" are the same.

This allowed for longer contexts and better prediction.

➕ show 1 reply

tuned • today at 7:08 AM

> This is obviously not powerful enough to express non-linear relationships - like graph relationships.

the distance metrics used is based on energy-informed graphs that encode energy relations in a distribution called taumode, see my previous paper on spectral indexing for vector databases for a complete roll-out

adroniser • yesterday at 2:16 PM

Adding the position vector is basic sure, but it's naive to think the model doesn't develop its own positional system bootstrapping on top of the barebones one.

➕ show 1 reply

alt Hacker News

Replies