I have a Supernote and was looking at different models for handwriting recognition, and I agree that gemma4-26B is the best I’ve tried so far (better than a qwen3-vl-8B and GLM-OCR). Besides turning off thinking, does your setup have any special sauce?
Q8 or Q6_UD with no KV cache quantization. I swear it matters even more with small activated parameters MOE model despite the minimal KL divergence drop