logoalt Hacker News

odeyesterday at 11:31 PM1 replyview on HN

Do we know why?


Replies

hammeiamyesterday at 11:47 PM

Sparse Attention, it's the highlight of this model as per the paper

show 2 replies