logoalt Hacker News

embedding-shapetoday at 3:14 PM1 replyview on HN

> SubQ 1.1 Small scores near-perfect at 1M, 2M, 6M, and 12M tokens. The model was trained predominantly at 1M tokens yet the retrieval held near perfectly at 12x that length, despite compressing attention to just 0.13% of relationships. This generalization is a direct consequence of SSA routing attention based on content relevance rather than fixed positional patterns.

If the results persists from 1M to 12M, why not 24M or 48M? Sounds almost too good to be true.

With back of the napkin math from inside my head, that'd be like 0.5/1 million LOC, depending on language/code density, could just fold the entire codebase into one prompt if it's a small one, that'd be neat :)


Replies

monster_trucktoday at 4:02 PM

It likely falls off very steeply after that. 8 to 1 (which I am assuming based on the 0.13% figure) is a pretty common ratio for sparse matrix stuff.