logoalt Hacker News

pixelpoettoday at 6:31 AM1 replyview on HN

IIRC llama.cpp doesn't implement DSv4's compressed attention mechanism, and while it does use (credited) parts of llama.cpp, it's focused on this great model for now. Much of this is covered better in the repo's readme.


Replies

rnewmetoday at 9:33 AM

In repo Readme and antirez reddit comments there was also expressed willingness to upstream.