logoalt Hacker News

Tepixyesterday at 2:59 PM1 replyview on HN

Sounds good. I saw that you use the FP8 version of the model. Do you also quantize the KV cache?


Replies

sacrelegeyesterday at 5:02 PM

no I don't, since there seem to be a silent degradation bug