logoalt Hacker News

hypferyesterday at 5:57 PM0 repliesview on HN

That math (250k context, Q4 model, 24GB VRAM) only checks out at q4 quant for the K/V cache, which is probably not the best idea.