it makes sense architecturally they replace dot-product attention with topology-based scalar dista...

liteclient • yesterday at 1:17 PM • 2 replies • view on HN

it makes sense architecturally

they replace dot-product attention with topology-based scalar distances derived from a laplacian embedding - that effectively reduces attention scoring to a 1D energy comparison which can save memory and compute

that said, i’d treat the results with a grain of salt give there is no peer review, and benchmarks are only on 30M parameter model so far

Replies

tuned • today at 7:10 AM

right. this is a proposal that needs to be tested. I started testing it on 30M parameters then I will move to a 100M and evaluate the generation on domain-specific assisting tasks

reactordev • yesterday at 2:10 PM

Yup, keyword here is “under the right conditions”.

This may work well for their use case but fail horribly in others without further peer review and testing.

➕ show 1 reply

alt Hacker News

Replies