Linear time attention doesn’t work, by principle. Dead end pursuit. Much great research on more effi...

andes314 • today at 3:56 PM • 1 reply • view on HN

Linear time attention doesn’t work, by principle. Dead end pursuit. Much great research on more efficient quadratic time inference

smokel • today at 4:57 PM

What about n log n?

alt Hacker News