High-Fidelity KV Cache Summarization Using Entropy and Low-Rank Reconstruction

15 points • by jchandra • last Sunday at 11:35 AM • 1 comment • view on HN

vivahir215 • last Sunday at 11:50 AM

Interesting Approach. Curious about the latency tradeoff: OLS + SVD are much heavier than Top-K.Have you benchmarked end-to-end inference latency?

➕ show 1 reply

jchandra • last Sunday at 11:36 AM

[dead]

alt Hacker News