logoalt Hacker News

cubefoxtoday at 3:43 PM2 repliesview on HN

I think DeepSeek V3.2 is sub n^2, but it clearly performs quite well, refuting the alleged lower bounds in the paper.


Replies

andy12_today at 4:13 PM

It really isn't sub N^2. The main attention is only O(Nk), but only thanks to a lightning indexer that still has complexity O(N^2). So overall it still has the same complexity; just with a smaller constant factor [1]

> DSA reduces the core attention complexity of the main model from O(L^2) to O(Lk), where k (<< L) is the number of selected tokens. Although the lightning indexer still has a complexity of O(L^2), it requires much less computation compared with MLA in DeepSeek-V3.1-Terminus

[1] https://arxiv.org/pdf/2512.02556

show 1 reply