logoalt Hacker News

andaiyesterday at 5:23 PM1 replyview on HN

This is interesting. Has there been more research into this architecture? I hear about it once every few years but it always seems like a niche / experimental thing. But based on the graph in their blog post you'd expect every company to be using this.


Replies

tunedtoday at 6:18 AM

This is a novel re-interpretation of the Transformer, based on my previous research made with a library called `arrowspace`.

It is somehow what is called a "Grassmann-like flow" but without the Plucker embedding, or also similar to what is done in DavisTensor but relying on spectral Laplacian instead of purely geometric distances.

The problem with a lot of stuff done before is that it focuses on dense representations. This architecture is focuses on sparse representation and provides a new approximation computation based on energy-informed graphs.