logoalt Hacker News

PeterStuerlast Sunday at 11:53 AM0 repliesview on HN

Because most of the people squeezing that highly quantized small model into their consumer gpu don't get how they have left no room for the activation weights, and are stuck with a measly small context.