Linear time complexity models are bad at in-context retrieval, which limits their performance on var...

cubefox • today at 1:52 PM • 0 replies • view on HN

Linear time complexity models are bad at in-context retrieval, which limits their performance on various tasks, so a pure linear model isn't currently feasible anyway, at least for language models. Instead they recommend mixing linear and attention layers. Presumably this mostly solves the performance problem (at least n benchmarks), but it also means the mixed architecture is no longer linear. It will still be faster and less RAM hungry in long context than a pure transformer though.

alt Hacker News