thanks for reading. I cannot retrain an existing model as the self-attention mechanism has been comp...

tuned • today at 6:12 AM • 0 replies • view on HN

thanks for reading. I cannot retrain an existing model as the self-attention mechanism has been completely redesigned. The Keys and Values in self-attention are stored as scalars, so a latent space with traditional weights does not make sense if used in the context of a topological transformer. The two latent spaces would be somehow equivalent eventually but they would store totally different values.

alt Hacker News