logoalt Hacker News

jchandralast Sunday at 11:57 AM0 repliesview on HN

In this prototype, OLS + SVD isn’t per-token, it runs only when the recycle bin fills (amortized over multiple tokens).

That said, it’s still heavier than Top-K. I haven’t benchmarked end-to-end latency yet; this is mainly exploring the accuracy vs memory tradeoff.