logoalt Hacker News

oofbeyyesterday at 4:57 PM0 repliesview on HN

Depending on how different the attention mechanism is, that might not work. If it’s just a faster / different way of finding the tokens to attend to, sure. But I get the sense the author is implying this method uses different semantics somehow. Although tbh I didn’t follow it entry.