I noticed the smaller the model (be it quant or parameters as the cause) the faster it'd run......

zamadatix • 01/21/2025 • 0 replies • view on HN

I noticed the smaller the model (be it quant or parameters as the cause) the faster it'd run.... but the longer it'd fight itself. For the same Calc II level problem all models were able to eventually get an answer but the distilled Qwen-32B at Q6 quant was fastest to actual answer completion.

alt Hacker News