logoalt Hacker News

woodsontoday at 2:51 PM1 replyview on HN

Look into RWKV.


Replies

JohannaAlmeidatoday at 3:04 PM

Yeah RWKV is definitely related in spirit (recurrent state for long context). Here I’m combining local windowed attention with a gated recurrent path + KV cache compression, so it’s more hybrid than fully replacing attention